

NTELLECTUAL PROPERTY ORGANIZATION 
International Bureau 

JB LI SHED UNDER THE PATENT COOPERATION TREATY (PCT) 




(71) Applicant: CHIRON CORPORATION [US/US]; 4560 
|f|^Horton- Street, Emeryville, CA 94608 (US). 

2VInrcBtois! HOUGHTON . Michael ; 53 Rosemead Court, 



W ^Danville, C A 94526 (US). CHOP. Ou i-Lim ; 5700 Fern 
" Street, EI Ccrrito, CA 94530 (US). KUCLGcorge ; 1370 
Sixth Avenue, San Francisco, CA 941 12 (US). 

(74) Agents: CIOTTI, Thomas, E. et al.; Irell & Manella, 545 
^Middlefield Road, Suite 200, Menlo Park, CA 
- : 94025-3471 (US). 



fG12N 9/50, 15/51,15/57 
fC12N 15/62, C12Q 1/37 



Al 



(1 1) International Pobli cation Number: WO 91/15575 

(43) International Publication Date: 17 October 1991 (17.10.91) 



pllnternarlonal Application Number: PCT/US91 /022 10 

4 April 1991(04.04.91) 



International Filing Date: 

"#rp--" - ' ' 

I) Priority data: 

#f 505;433; ■ 



4 April 1990(04.04.90) 



US 



(81) Designated States: AT (European patent), AU, BB, BE 
(European patent), BF (OAPI patent), BG, BJ (OAPI 
patent), BR, C A, CF (OAPI patent), CG (OA?I patent); 
CH, CH (European patent), CM (OAPI patent), DE 
(European patent), DK (European patent), ES (Euro- 
pean patent), FI, FR (European patent), GA (OAPI pa- 
tent), GB (European patent), GR (European patent), 
HU, IT (European patent), JP, KP, KR, LK, LU (Euro- 
pean patent), MC, MG, ML (OAPI patent), MR (OAPI 
patent), MW, NL (European patent), NO, PL, RO, SD, 
SE (European patent), SN (OAPI patent), SU, TD (OA- 
PI patent), TG (OAPI patent). 



Published 

With international search report 

Before the expiration of the time limit for amending the 
claims and to be republished in the event of the receipt of 
amendments. 



^^iWe: HEPATITIS C VIRUS PROTEASE 
(57) Abstract 



m fpTbe protease necessaiy for polyprotein processing in Hepatitis C virus is identified, cloned, and expressed. Proteasj^ tron-;^ 
iated I protease, and altered proteases are disclosed which are useful for cleavage of specific polypeptides, and for andj^ 
sign of antiviral agents specific for HCV. |; "V ~ A 
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nros-s-Reference to Related Applications 

This application is a continuation-in-part application of U.S. Serial 
No. 07/505,433, filed on 4 April 1990. 



Technical Field 

- This invention relates to the molecular biology and virology of the 
hepatitis G virus (HCV). More specifically, this invention relates to a novel 
protease produced by HCV, methods of expression, recombinant protease, prote- 
ase mutants, and inhibitors of HCV protease. 
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example, PCT WO89/046699; U.S. Patent Application Serial No. 7/456,637, filed 
21 December 1989; and US. Patent Application Serial No. 7/456,637, filed 21 
December 1989, incorporated herein by reference. Hepatitis C appears to be the 
major form of transfusion-associated hepatitis in a number of countries, including 
the United States and Japan. There is also evidence implicating HGV in induc- 
tion of hepatocellular carcinoma. Thus, a need exists for an effective method for 
treating HCV infection: currently, there is none. 

Many viruses, including adenoviruses, baculoviruses, comoviruses, 
picornaviruses, retroviruses, and togaviruses, rely on specific, virally-encoded pro- 
teases for processing polypeptides from their initial translated form into mature, 
active proteins. In the case of picornaviruses, all of the viral proteins are believed 
to arise from cleavage of a single polyprotein (B.D. Korant, CRC Crit Rev 

S. Pichuarites et al, in "Viral Proteinases As Targets For Chemother- 
apy" (Cold Spring Harbor I^bor^ 1989) pp. 215-22, disclosed expression 
of a viral protease found in HIV-1. The HIV protease was obtained in the form 
of a fusion protein, by fusing DNA encoding an HIV protease precursor to DN A 3 
encoding human superoxide dismutase (hSOD), and expressing the product in E. 
coli. Transformed cells expressed, products of 36 and 10 kDa (corresponding to 
the hSOD-protease fusion protein and the protease alone), suggesting that the y% 
protease was expressed in a form capable of autocatalytic proteolysis; 

TJ. McQuade et al, .Sofincfe (1990) 242:454-56 disclosed preparation 
of a peptide mimic capable of specifically inhibiting the HIV-1 protease. In HIV, 
the protease is believed responsible for cleavage of the initial p55 gag precursor 
transOT the core structural proteins (p!7, p24, p8, and p7). Adding 1 fiM 
inhibitor to HIV-infected peripheral blood lymphocytes in culture reduced the 
concentration of processed HIV p24 by about 70%. Viral maturation and levels 
;,of infectious viras: were ^ 
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Disclosure of the Invention 

We have now invented recombinant HCV protease, HCV protease 
fusion proteins, truncated and altered HCV proteases, cloning and expression vec- 
tors therefore, and methods for identifying antiviral agents effective for treating 
HCV. 



Brief Description of the Drawings 

Figure 1 shows the sequence of HCV protease. 
Figure 2 shows the. polynucleotide sequence and deduced amino 
10 acid sequence of the clone C20c 

Figure 3 shows the polynucleotide sequence and deduced amino 
acid sequence of the clone C26d. 

Figure 4 shows the polynucleotide sequence and deduced amino 
add sequence of the clone C8h. 

Figure 5- shows the polynucleotide sequence and deduced amino 
,v idd s^ehct,.ot the clone C7f. 

> Figure 6 shows the polynucleotide sequence and deduced amino 

add sequence of the done C31. 

. Figure 7 shows the polynucleotide sequence and deduced: amino 

20, acid sequence of the clone C35. 

. . Figure 8, shows the polynucleotide sequence and deduced amino 
add sequence of the done C33c. 

Figure?? schema 

C7fC20cC360C200. 

Figure 10 shows the sequence of vector cflSODp600. 
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Modes of Carrying Out Tha lnvi» n tj ft n 
A. Definitions 

The terms "Hepatitis C Virus" and "HCV" refer to the viral species 
that is the major etiological agent of BB-NANBH, the prototype isolate of which 
is identified in PCT WO89/046699; EPO publication 318^16; USSN 7/355,008, 
filed 18 May 1989; and USSN 7/456,637, the disclosures of which are incorpor- 
ated herein by reference. "HCV" as used herein includes the pathogenic strains 
capable of causing hepatitis C, and attenuated strains or defective interfering par- 
ticles derived therefrom. The HCV genome is comprised of RNA. It is known 
that RNA-containing viruses have relatively high rates of spontaneous mutation, 
reportedly on the order of 10" 3 to 10" 4 per incorporated nucleotide (Fields & 
•Knipe, "Fundamental Virology" (1986, Raven Press, N.Y.)). As heterogeneity and 
fluidity of genotype are inherent characteristics of RNA viruses, there will be mul- 
tiple strains/isolates, which may be virulent or avirulent, within the HCV species. 

Information on several different strains/isolates of HCV is disclosed 
herein, particularly strain or isolate CDC/HCVI (also called HCVl). Informati n 
from one strain or isolate, such as a partial genomic sequence, is sufficient to 
allow those skilled in the art using standard techniques to isolate new strains/ 
isolates and to identify whether such new strains/isolates are HCV. For example, 
several different strains/isolates are described below. These strains, which were 
obtained from a number of human sera (and from different geographical areas), 
were isolated utilizing the information from the genomic sequence of HCVl 

The information provided herein suggests that HCV may be dis- 
tantly related to the flaviviridae. The Flavivirus family contains a large number of 
viruses which are small, enveloped pathogens of man. The morphology and com- 
position f Flavivirus particles are known, and are discussed in M.A. Brinton, in 
The Viruses: The Togaviridae And Flaviviridae" (Series eds. Fraenkel-Conrat and 
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Wagner, vol. eds. Schlesinger and Schlesinger, Plenum Press, 1986), pp. 327-374. 
Generally, with respect to morphology, Flaviviruses contain a central nucleocapsid 
surrounded by a lipid bilayer. Virions are spherical and have a diameter of about 
40-50 nm. Their cores are about 25-30 nm in diameter. Along the outer surface 
of the virion envelope are projections measuring about 5-10 nm in length with ter- 
minal knobs about 2 nm in diameter. Typical examples of the family include 
Yellow Fever virus, West Nile virus, and Dengue Fever virus. They possess posi- 
tive-stranded RNA genomes (about 11,000 nucleotides) that are slightly larger 
than that of HCV and encode a polyprotein precursor of about 3500 amin acids. 
Individual viral proteins are cleaved from this precursor polypeptide. 

The genome of HCV appears to be single-stranded RNA containing 



about 10,000 nucleotides. The genome is positive-stranded, and possesses a c n 



;# 
-t 



tinuous translational open reading frame (ORF) that encodes a polyprotein of 
about 3,000 amino acids. In the ORF, the structural proteins appear to be 
encoded in approximately the first quarter of the N-terminal region, with the f 
majority of the polyprotein , attributed to non-structural proteins. When compared! 
with all known viral sequences, small but significant co-linear homologies are | 
observed with the non-structural proteins of the Flaviyirus family, and with the 
pestiviruses (which are now also considered to be part of the Flavivirus family). 

A schematic alignment of possible regions of a flaviviral polyprotein 
(using Yellow Fever Virus as an example); and of a putative polyprotein encoded 
in the major ORF of the HCV genome, is shown in Figure 1. Possible domains f 
the HCV polyprotein are indicated in the figure. The Yellow Fever Virus poly- 
protein contains, from the amino terminus to the carboxy terminus, the nucleocap- 
sid protein (C), the matrix protein (M), the envelope protein (E), and the n n- 
structural proteins 1, 2 (a+bfei, 4 (a+b), and 5 (NSl, NS2, NS3, NS4, and NS5). 
Based upon the putative amino acids encoded in the nucle tide sequence f 
HCV1, a small domain at the extreme N-terminus of the HCV polyprotein 
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appears similar both in size and high content of basic residues to the nucleocapsid 
protein (C) found at the N-terminus of flaviviral polyproteins. The non-structural 
proteins 23»4, and 5 (NS2-5) of HCV and of yellow fever virus (YFV) appear to 
have counterparts of similar size and hydropathidty, although the amino acid 
sequences diverge. However, the region of HCV which would correspond to the 
regions of YFV polyprotein which contains the M, E, and NS1 protein not only 
differs in sequence, but also appears to be quite different in size and hydropathic* 
ity. Thus, while certain domains of the HCV genome may be referred to herein 
as, for example, NS1, or NS2, it should be understood that these designations aire 
for convenience of reference only; there may be considerable differences between 
the HCV family and flaviyiruses that have yet to be appreciated. 

Due to the evolutionary relationship of the strains or isolates of 
; HC^, piita^ and isolates are identifiable by their homology at the 

polypeptide level. With respect to the isolates disclosed herein, new HCV strains 
or isolates are expected to be at least about 40% homologous, some more than 
about 70% homologous, and some even more than about 80% homologous: some 
may be more than about 90% homologous at the polypeptide level. The tech- 
niques for determining amino acid sequence homology are known in the art. For ; 
example, the amino acid sequence may be determined directly and compared to 
the sequences provided herein v Alternatively the nucleotide sequence of the gen- 
omic material of the putative HCV may be determined (usually via a cDNA inter- 
mediate); the amino acid sequence encoded therein can be determined, and the 
coirespon^ ' 

The te^ ^CV protease" refers to an enzyme derived torn HCV 
-Which; ebchibits proteolytic activity, specifically the polypeptide encoded in the NS3 

At least one strain of HCV contains; a protease ^ 
^ bfelieved to be substantially encoded by r within the f Mowing sequence: 
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Arg Arg Gly Arg Glu lie Leu Leu Gly Pro 
Ala Asp Gly Met Val Ser Lys Gly Tip Arg 
Leu Leu Ala Pro He Thr Ala Tyr Ala Gin 
Gin Thr Arg Gly Leu Leu Gly Cys He He 
]; 5 s Thr Ser Leu Thr Gly Arg Asp Lys As:; Gin 

Val Glu Gly Glu Val Gin lie Val Ser Thr 
Ala Ala Gin Thr Phe Leu Ala Thr Cys lie 
Asn Gly Val Cys Tip Thr Val Tyr His Gly 
Ala Gly Thr Arg Thr lie Ala Ser Pro Lys 
;10 ^ : Gly Pro Val He Gin Met Tyr Thr Asn Val 

Asp Gin .Asp Leu Val Gly Tip Pro Ala Ser 
Gin Gly Thr Arg Ser Leu Thr Pro Cys Thr 
Cys Gly Ser Ser Asp Leu Tyr Leu Val Thr 
Arg Hi;- Ala Asp Val lie Pro Val Arg Arg 
15 Arg Gly Asp Ser Arg Gly Ser Leu Leu Ser 

Pro Arg Pro lie Ser Tyr Leu Lys Gly Ser 
Ser Gly Gly Pro Leu Leu Cys Pro Ala Gly 
His Ala Val Gly He Phe Arg Ala Ala Val 
Cys Thr Arg Gly Val Ala Lys Ala Val Asp 
20 Phe He Pro Val Glu Asn Leu Glu Thr Thr 

;;*<% : . Met Arg ••• 

f ! ! . The above N and C termini are putative, the actual termini being 

defined by expression and processing in an appropriate host of a DNA construct 
encoding the entire NS3 domain. It is understood that this sequence may vary 
25"" from strain to strain, as RNA viruses like HCV are known to exhibit a great deal 
of variation. Further, the actual N and C termini may vary, as the protease is 
cleaved from a precursor polyprotein: variations in the protease amino acid 
sequence can result in cleavage from the polyprotein at different points. 'Thus, f 
' the amino- and carboxy-termini may differ from strain to strain of HCV. The first! 
30> amino acid shown above corresponds to residue 60 in Figure 1. However, the 
"v'. ... minimum sequence necessary for activity can be determined by routine methods. 
The sequence may be truncated at either end by treating an appropriate expres- 
sion vector with an exonuclease (after cleavage at the 5* or 3' end of the coding 
sequence) to remove any desired number of base pairs. The resulting coding 
35; polynucleotide is then expressed and the sequence determined. In this manner 
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the activity of the resulting product may be correlated with the amino acid 
sequence: a limited series of such experiments (removing progressively greater 
numbers of base pairs) determines the minimum internal sequence necessary for 
protease activity. We have found that the sequence may be substantially trun- ^ 
cated, particularly at the carboxy terminus, apparently with full retention of pro- 
tease activity. It is presently believed that a portion of the protein at the carboxy 
terminus may exhibit helicase activity. However, helicase activity is not required 
of the HCV proteases of the invention. The amino terminus may also be trun* 
cated to a degree without loss of protease activity. 

The amino acids underlined above are believed to be the residues 
necessary for catalytic activity, based on sequence homology to putative flavivirus 
serine proteases. Table 1 shows the alignment of the three serine protease cat* 
alytic residues for HCV protease and the protease obtained from Yellow Fever 
Virus, West Nile Fever virus, Murray Valley Fever virus, and Kunjin virus. 
Although the other four flavivirus protease sequences exhibit higher homology 
^ . with each other than with HCV, a degree of homology is still observed with HCV. 
'§£■&■,_ This homology, however, was not sufficient for indication by currently available 
alignment software. The indicated amino acids are numbered His^ Asp l03 , and 
Ser^, in the sequence listed above (His 139 , Asp 163 , and Ser^, in Figure !). 

Alignment of Active Residues by Sequence 



Protease- 



His 



Asp 



Ser 



Yelibv;:*«ver FHTMWHVTR 
West^Ni le'.j Fever PHTLWHTTK 
Murray '.yailey.:; y THTLW HTTR 
Ktinjlh yiruis FHTLWHTTK 



DQBLGHPAP 

KEQLVAY66 
KEfiRLCYGG 
KEDRVTYGG 
KEERLCYGG 



LKG8£GGPL 
PSGTfiGSPI 
PTGT£GSPI 
PIGTfiGSPI 
PTGT2GSPI 
t 
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Alternatively, one can make catalytic residue 
assignments based on structural homology. Table 2 shows 
alignment of HCV with against the catalytic sites of several 
well-characterized serine proteases based on structural c n- 
siderations: protease A from Streptomyces grlseus, a-lytic 
protease, bovine trypsin, chymotrypsin, and elastase (M. 
James et al, Can J Biochem f 1978) ££:-3'96)v Again, a degr e 
of homology is observed. The HCV residues identified are 
numbered His 70 , Asp^j, and Ser m in the sequence listed 
above. 

TABLE 2: Alignment of Active Residues by Structure 



Protease 


His 


Asp 


■ Ser; ' 


S. grlseus A 


TAGHC 


NNfiYGII 


GDfiGGSL 


ot-Lytic protease 


TAGHC 


GNJ2RAWV 


GDjgGGSW 


Bovine Trypsin 


SAAHC 


NNfilMLI 


GD£GGPV 


Chynotryps in 


TAAflC 


NNfilTLL 


GD£GGPL 


Elastase 


TAAflC 


GYJ2IALL 


GDfiGGPL 


HCV 


TVYflG 


88£LYLV 


G8£GGPL 




The most direct manner to verify the residues essential to the active 
site is to replace each residue individually with a residue of equivalent stearic size. 
This is easily accomplished by site-specific mutagenesis and similar methods 
known in the art If replacement of a particular residue with a residue of equiva- 
lent size results in loss of activity, the essential nature of the replaced residue is i 
confirmed. . . 

w ^GV protease analogs^^ polypeptides which vary from the 

full length protease sequence by deletion, alteration and/or addition to the amino 
acid sequence of the native protease, v HCV protease analogs include the trun- 
cated proteases described ab^^^ n 
pr teins comprising HCV protease, truncated protease, or pr tease muteins. 
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Alterations to form HGV protease muteins are preferably conservative amin add 
substitutions, in which an amino acid is replaced with another naturally-occurring 
amino acid of similar character. For example, the following substitutions are con- 
sidered "conseIvative ,, : 

Gly ~ Ala; Lys-Arg; ; 

Val - He - Leu; Asn- Glh; and 

Asp-Glu; Phe ~Trp - Tyr. 



m 

"i^fd 



Ml 
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Nonconservative changes are generally substitutions of one of the above amino 
adds with an amino acid from a different group (e.g., substituting Asn for Glu), or 
substituting Cys, Met, His, or Pro for any of the above amino acids. Substitutions 
involving common amino acids are conveniently performed by site specific muta- 
genesis of an expression vector encoding the desired protein, and subsequent 
expression of the altered form. One may also alter amino acids by synthetic or 
semi-synthetic methods. For example, one may convert cysteine or serine residues 
to selenocysteine by appropriate chemical treatment of the isolated protein. 
Alternatively, one may incorporate uncommon amino acids in standard in vitro 
protein synthetic methods. Typically, the total number of residues changed, 
deleted or added to the native sequence in the muteins will be no more than 
about 20, preferably no more than about 10, and most preferably no more than 
about 5. 

The term fusion protein generally refers to a polypeptide comprising 
an amino acid sequence drawn from two or more individual proteins. In the 
present invention, "fusion protein" is used to denote a polypeptide comprising the 
HCV protease, truncate, mutein or a functional portion thereof, fused to a non- 
HGV protein or polypeptide ("fusion partner"). Fusion proteins are most conven- 
iently produced by expression of a fused gene, which encodes a portion of one 
polypeptide at the 5' end and a portion of a different polypeptide at the 3 1 end, 
where the different portions are joined in one reading frame which may be 
expressed in a suitable host. It is presently preferred (although not required) to 
position the HCV protease; or analog at the carboxy terminus of the fusion pro- 
tein, and to employ a functional engine fragment at the amino terminus. As the 
HCV protease is normally expressed within a large polyprotein, it is not expected 
to include cell transport signals (e.g^ export or secretion signals). Suitable func- 
tional enzyme fragments are those polypeptides which exhibit a quantifiable activ- 
ity when expressed fused to the HCV protease. Exemplary enzymes include, with- 
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out limitation, fi-galactosidase (B-gal), ^-lactamase, horseradish peroxidase (HRP), 
^ • glucose oxidase (GO), human superoxide dismutase (hSOD), urease, and the like. 
P$t These enzymes are convenient because the amount of fusion protein produced 

can be quantified by means of simple colorimetnc assays. Alternatively, one may 
5 employ antigenic proteins or fragments, to permit simple detection and quantifica- 
tion of fusion proteins using antibodies specific for the fusion partner. The pres- 
ently preferred fusion partner is hSOD. 

B. General Method 

V'.-io- The practice of the present invention generally employs conven- 

tional techniques of molecular biology, microbiology, recombinant DNA, and 
S| immunology/which are within the skill of the art Such techniques are explained 
v Hfiilly in the literature. See for example J. Sambrock et al, "Molecular Cloning; A 
If-. Laboratory Manual (1989); "DNA Cloning", Vol.1 and H (D.N Glover ed. 1985); 
1 15 "Oligonucleotide Synthesis" (MJ. Gait ed, 1984); "Nucleic Add Hybridization" 
^0^ifi^ttiam»A SJ. Higgins eds. 1984); "Transcription And Translation" (B.D. 
1T : Hames & SJ. Higgins eds. 1984); "Animal Cell Culture" (jPLI. Freshney ed. 1986); 
^1 ; "Immobilized Cells And Enzymes" (IRL Press, 1986); B. Perbal, "A Practical 

: Guide To Molecular Cloning" (1984); the series, "Methods In Enzymology" 
U2p^(Acadeniic.^PresSi Inc.); "Gene Transfer r yectors For Mammalian Cells" (J.H. 
: -Miller and M.P. Calos eds. 1987, Cold Spring Harbor Laboratory); .MeJh 

Enzvmol (1987) 154 and 155 (Wu and Grossman, and Wu, eds., respectively); 
Mayer & Walker, eds. (1987), "Immunochemical Methods In Cell And Molecular 
uBjoUsgyT (Academic Press, London); :■: Scopes, "Protein Purification: Principles And 
^Practicer, 2nd Ed (Sprin Of Experimental 
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designated host are used. Among prokaryotic hosts, E. coli is most frequently 
used. Expression control- sequences for prokaryotes include promoters, optionally 
containing operator portions, and ribosome binding sites. Transfer vectors com- 
patible with prokaryotic hosts are commonly derived from, for example, pBR322, 
a plasmid containing operons conferring ampicillin and tetracycline resistance, and 
the various pUC vectors, which also contain sequences conferring antibiotic resist* 
glance, markers. These plasmids are commercially available. The markers may be 
used to obtain successful, transformants by selection. Commonly used prokaryotic 
control sequences include the 8-lactamase (penicillinase) and lactose promoter 
systems (Chang et al, .Mature (1977) i2& 1056), the tryptophan (trp) prpmoter sys- 
tem (Goeddel et al, Nuc Acids Res (1980) jj:4057) and the lambda-derived P L 
promoter and N gene ribosome binding site (Shimatake et al. Nature (1981) 
222rl28> arid the hybrid iac promoter (De Boer et al, Proc Nat Acad Sci USA 
(1983) 292:128) derived from sequences of the trp and lac UVS promoters. The 
foregoing systems are particularly compatible with K coli; if desired, other pro- 
karyotic hosts such as strains of Bacillus or Pseudomonas may be used, with cor- 
responding control sequences. 

Eukaryotic hosts include without limitation yeast and mammalian 
cells in culture systems. Yeast expression hosts include Sacdiaromyces, Klebsiella* 
Piaa, and the like. Sacdxaromyces cerevisiae and Saccharomyces carlsbergensis and 
K lactis btc the most commonly used yeast hosts, and are convenient fungal hosts. 
Yeast-compatible vectors carry" markers which permit selection of successful trans- 
formants by conferring prototrophy^ auxotrophic mutants or resistanre 
jrat^ emplpy the : &ft origin 

•c^repliratiraf 

CEN3 arid ARS1 or other me^; such as sec^ei^ whi*; 

wUr^ultiiU n if an appr priate fr^ent int the h st cell'genome. 

Control sequences for yeast vectors are known in the art and include promoters ^ 




&W091/15575 



PCT/US91/02210 




-14 



for the synthesis of glycolytic enzymes (Hess et al, J AHv Bnrvme Ree (1968) J: 
149; Holland et al, JtfocJieju (1978), i2:4900), including the promoter for 3-phos- 
phoglycerate kinase (R. Hitzeman et al. J Biol Chem (1980)255:2073). Termin- 
ators may also be included, such as those derived from the enolase gene (Holland, 
J Biol Chem (1981) 256:1385). Particularly useful control systems are those which 
comprise the glyceraldehyde-3 phosphate dehydrogenase (GAPDH) promoter or 
alcohol dehydrogenase (ADH) regulatable promoter, terminators also derived 
from GAPDH, and if secretion is desired, a leader sequence derived from yeast a- 
factor (see U.S. Pat No. 4,870,008, incorporated herein by reference). ) 

A presently preferred expression system employs the ubiquitin 
leader as the fusion partner. Copending application USSN 7/390,599 filed 7 
August 1989; disclosed vectors for high expression of yeast ubiquitin fusion pro- 
teins. Yeast ubiquitin provides a 76 amino acid polypeptide which is automat- 
ically cleaved from the fused protein upon expression. The ubiquitin amino add 
sequence is as follows: 



Leu Glu Val Glu Ser Ser Asp Thr lie Asp Asn Val 
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In addition, the transcriptional regulatory region and the transcrip- 
tional initiation region which are operably linked may be such that they are not 
naturally associated in the wild-type organism. These systems are described in 
detail in EPO 120,551, published October 3, 1S34; EPO 116,201, published 
5 August 22, 1984; and EPO 164,556, published December 18, 1985, all of which 
are commonly owned with the present invention, and are hereby incorporated 
herein by reference in full. 

Mammalian cell lines available as hosts for expression are known in 
the art and include many immortalized cell lines available from the American 
) Type Culture Collection (ATCQ, including HeLa cells, Chinese hamster ovary 
(CHO) cells, baby hamster kidney (BHK) cells, and a number of other cell lines. 
Suitable promoters for mammalian cells are also known in the art and include 
viral promoters such as that from Simian Virus 40 (SV40) (Fiers et al, Nature . 
(1978) 222:113), Rous sarcoma virus (RSV), adenovirus (ADV), and bovine papil- 
i loma virus (BPV). Mammalian cells may also require terminator sequences and 
poly-A addition sequences. Enhancer sequences which increase expression may 
also be included, and sequences which promote amplification of the gene may 
also be desirable (for example methotrexate resistance genes). These sequences ••: 
are known in the art 

Vectors suitable for replication in mammalian cells are known in the 
art, and may include viral replicons, or sequences which insure integration of the , 
appropriate sequences encoding HCV epitopes into the host genome. For . 
^/ example!, another vector used to express foreign DNA is Vaccinia virus. In this 
4 case the heterologous' DNA is inserted into the Vaccinia genome. Techniques fr 
the insertion of foreign DNA into' the vaccinia virus genome are known in the art, 
X and may utilize, for example, homologous recombination. The heterologous DNA 

; IS generally inserted into a gene which is non-essential t the virus, f r example, 
,: j the thymidine kinase gene (Ik), which also provides a selectable marker. Plasmid 
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vectors that greatly facilitate the construction of recombinant viruses have been 
described (see, for example, Mackett et al, J Virol (1984) 42:857; Chakrabarti et 
al, Mnl Cell Biol (1985) £3403; Moss, in GENE TRANSFER VECTORS FOR 
MAMMALIAN CELLS (Miller and Calos, eds„ Cold Spring Harbor Laboratory, 
fife NY, 1987), p. 10). Expression of the HCV polypeptide then occurs in cells or ani- 
mals which are infected with the live recombinant vaccinia virus. 

In order to detect whether or not the HCV polypeptide is expressed 
from the vaccinia vector, BSC 1 cells may be infected with the recombinant vector 
and grown on microscope slides under conditions which allow expression. The 
10 cells may then be acetone-fixed, and immunofluorescence assays performed using 
serum which is known to contain anti-HCV antibodies to a polypeptide^) 
encoded in the region of the HCV genome from which the HCV segment in the 
recombinant expression vector was derived. 

Other systems for expression of eukaryotic or viral genomes include 
15 insect cells and vectors suitable for use in these cells. These systems are known in 
the art, and include, for example, insect expression transfer vectors derived from 
the bzoAovirus Autoffpphd catifomica nuclear polyhedrosis virus (AcNPV), which 
is a helper-independent, viral expression vector. Expression vectors derived from 
this system usually use the strong viral polyhedrin gene promoter to drive 
20 expression of heterologous genes. Currently the most commonly used transfer 
vector for introducing foreign genes into AcNPV is pAc373 (see PCTW089/ 
' 046699 and USSN 7/456,637). Many other vectors known to those of skill in the 
art have also been designed for/imph^-.expressioii. These include, for example, 
pVL985 (whiA from ATG to ATT, and intro- 

duces a BamHI cloning site 32 bp downstream from the ATT; See Luckow and 
Summers, Jtfinl (1989) 12:31). AcNPV transfer vectors for high level expression 
of nonfused foreign proteins are escribed in copending applications PCT 
WO89/046699 and USSN 7/456,637. A unique BamHI site is located f llowing 
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position -8 with respect to the translation initiation codon ATG of the polyhedrin 
gene. There are no cleavage sites for Smal, PstI, Bgffl, Xbal or Sstl. Good 
expression of nonfused foreign proteins usually requires foreign genes that ideally 
have a short leader sequence containing suitable translation initiation signals pre- 
ceding an ATG start signal. The plasmid also contains the polyhedrin polyadenyl- 
ation signal and the ampidllin-resistance (amp) gene and origin of replication for 
selection and propagation in £1 coli. 

Methods for the introduction of heterologous DNA into the desired 
site in the baculovirus virus are known in the art (See Summer and Smith, Texas 
Agricultural Experiment Station Bulletin No. 1555; Smith et al. Mol Cell Biol 
( 1983) 2:2156-2165; and Luckow and Summers, ^kol (1989) 12:31). For example, 
the heterologous DNA can be inserted into a gene such as the polyhedrin gene by 
homologous recombination; or into a restriction enzyme site engineered into the 
desired baculovirus gene. The inserted sequences may be those which encode all 
or varying segments of the polyprotein, or other orfs which encode viral polypep- 
tides. For example, the insert could encode the following numbers of amino acid 
segments from the polyprotein: amino acids 1-1078; amino acids 332-662; amino 
acids 406-662; amino acids 156-328, and amino acids 199-328. 

The signals for posMranslational modifications, ^ 
tide cleavage, proteolytic cleavage, and phosphorylation, appear to be recognized ■ 
by insect cells. The signals required for secretion and nuclear accumulation alsbg 
^Kap^^4o l)e conserved between the invertebrate cells and vertebrate cells. 
Examples of the signal sequences from vertebrate cells which are effective in 
invertebrate cells are known in the art, for example, the liuriian interleukin-2 sig- 
nal (IL2 e ) which signals for secretion from the cell, is recognized arid properly 
removed in insect cells. 

Transformation may be by any known method for introducing poly- 
nucleotides into a host cell, including, for example packaging the polynucleotide in 
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a virus and transducing a host cell with the virus, and by direct uptake of the poly- 
nucleotide. The transformation procedure used depends upon the host to be 
transformed. Bacterial transformation by direct uptake generally employs treat- 
ment with calcium or rubidium chloride (Cohen. P.nc Nat Acad Sri USA (19721 
£22110; T. Maniatis et al, "Molecular Cloning; A Laboratory Manual" (Cold 
Spring Harbor Press, Cold Spring Harbor, NY, 1982). Yeast transformation by 
. direct uptake may be carried out using the method of Hinnen et al, Proc Nat 
Acad Sri USA (1978) 15:1929. Mammalian transformations by direct uptake may 
be conducted using the calcium phosphate precipitation method of Graham and 
Van der Eb, Jfliol (1978) 52:546, or the various known modifications thereof. 
Other methods for introducing recombinant polynucleotides into cells, particularly 
into mammalian cells, include dextran-mediated transfection, calcium phosphate 
mediated transfection, polybrene mediated transfection, protoplast fusion, electro- 
poration, encapsulation of the polynucleotide^) in liposomes, and direct micro- 
injection of the polynucleotides into nuclei. 

Vector construction employs techniques which are known in the art. 
Site-specific DNA cleavage is performed by treating with suitable restriction 
enzymes under conditions which generally are specified by the manufacturer of 
) these commercially available enzymes. In general, about 1 jtg of plasmid or DNA 
sequence is cleaved by 1 unit of enzyme in about 20><L buffer solution by incuba- 
tion for 1-2 hr. at 37°C After incubation with the restriction enzyme, protein is 
removed by phenol/chloroform extraction and the DNA recovered by precipita- 
tion with ethanol. The cleaved fragments may be separated using polyacrylamide 
or agarose gel electrophoresis techniques, according to the general procedures 
described in Meth Enzvmol (1980) f5:499-560. 

Sticky-ended cleavage fragments may be blunt ended using £ coli 
DNA polymerase I (Klenow fragment) with the appropriate deoxynucleotide tri- 
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phosphates (dNTPs) present in the mixture. Treatment with SI nuclease may also 
be used, resulting in the hydrolysis of any single stranded DNA portions. 

Ligations are carried out under standard buffer and temperature 
conditions using T4 DNA ligase and ATP; sticky e/id ligations require less ATP 
and less ligase than blunt end ligations. When vector fragments are used as part 
of a ligation mixture, the vector fragment is often treated with bacterial alkaline 
phosphatase (BAP) or calf intestinal alkaline phosphatase to remove the 5'-phos- 
phate, thus preventing religation of the vector. Alternatively, restriction enzyme 
digestion of unwanted fragments can be used to prevent ligation. 

Ligation, mixtures are transformed into suitable cloning hosts, such 
as E. co//, and successful transformants selected using the markers incorporated 
(e.g., antibiotic resistance), and screened for the correct construction. 

Synthetic oligonucleotides may be prepared using an automated 
oligonucleotide synthesizer as described bv Warner. DNA Y1984) 3!401. If 
desired, the ^nthetic strands may be labeled with '^P by treatment with polynuc- 
leotide kinase in the presence of ^P-ATP under standard reaction conditions. 

. DNA sequences, including those isolated from cDNA libraries, may 
be modified by known techniques, for example by site directed mutagenesis (see 
e.g n Zoller, Nuc Adds Res (1982) iQ:6487). Briefly, the DNA to be modified is 
packaged into phage as a single stranded sequence, and converted to a double 
stranded DNA with DNA polymerase, using as a primer a synthetic oligonucleo- 
tide complementary to the portion of the DNA to be modified, where the desired 
mm;*-? modification is included in the primer sequence. The resulting double stranded 
£' : iB&^ DNA is transformed into a phage-supporting host bacterium. Cultures of the 
^-25; transformed bacteria which contain copies of each strand of the phage are plated 
in agar to obtain plaques. Theoretically, 50% of the new plaques contain phage 
^ %r ^ , ; having the mutated sequence, and the remaining 50% have the original sequence. \ 
>^ ;^Rcplicates of the plaques are hybridized to labeled synthetic probe at temper- 
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atures and conditions which permit hybridization with the correct strand, but not 
|tif with the unmodified sequence. The sequences which have been identified by 
hybridization are recovered and cloned. 

DNA libraries may be probed using the procedure of Grunstein and 
Hogness Proc Nat Acad Sci USA (1975)22:3961. Briefly, in this procedure the 
DNA to be probed is immobilized on nitrocellulose filters, denatured, and pre- 
hybridized with a buffer containing 0-50% formamide, 0.75 M NaCl, 75 mM Na 
citrate, 0,02% (wt/v) each of bovine serum albumin, polyvinylpyrrolidone, and 
Ficoll*, 50 mM NaH 2 P04 (pH 6.5), 0.1% SDS, and 100 /*g/mL carrier denatured 
DNA. The percentage of formamide in the buffer, as well as the time and tem- 
perature conditions of the prehybridization and subsequent hybridization steps 
depend on the stringency required. Oligomeric probes which require lower strin- 
gency conditions are generally, used with low percentages of formamide, lower 
temperatures, and longer hybridization times. Probes containing more than 30 or 
40 nucleotides, such as those derived from cDNA or genomic sequences generally 
employ higher temperatures, e.g., about 40-42°C and a high percentage formam- 
^ ide, e.g», 50%. Following prehybridization, S'-^P-labeled oligonucleotide probe is 
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added to the buffer, and the filters are incubated in this mixture under 
hybridization conditions. After washing, the treated filters are subjected to 
autoradiography to show the location of the hybridized probe; DNA in corres- 
ponding locations on the original agar plates is used as the source of the desired 
DNA. 

For routine vector constructions, ligation mixtures are transformed 
flfilinto E.coIi strain HB101 or other suitable hosts, and successful transformants sel- 
2^ ected by antibiotic resistance or other markers. Plasmids from the transformants 
aire thenprepared according to the method of Clewell et al. Proc Nat Acad Sd 
USA ( 1969) 62:1159. usually following chloramphenicol amplification (Clcwcll, 1 
Bacterid (1972) 110:667). The pNAisjsoiated and analyzed, usually by restric- 
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tion enzyme analysis and/or sequencing. Sequencing may be performed by the 
dideoxy method of Sanger et atPrec Nat Acad Sci USA (1977)24:5463, as far- 
ther described by Messing et aL Nuc Acids Res (1981) 5:309, or by the method of 
Maxam et al, Meth Enzvmol (1980) j&:499. Problems with band compression, 
which are sometimes observed in GC-rich regions, were overcome by use of T- 
deazoguanosine according to Barr et aL Biotechniques (1986)4:428. 

The enzyme-linked immunosorbent assay (EOSA) can be used to 
measure either antigen or antibody concentrations. This method depends upon 
conjugation of an enzyme to either an antigen or an antibody, and uses the bound 
enzyme activity as a quantitative label. To measure antibody, the known antigen 
is fixed to a solid phase (e.g., a microtiter dish, plastic cup, dipstick, plastic bead, 
or the like), incubated with test serum dilutions, washed, incubated with anti- 
immunoglobulin labeled with an enzyme, and washed again. Enzymes suitable for 
labeling are known in the art, and include, for example, horseradish peroxidase 
(HRP). Enzyme activity bound to the solid phase is usually measured by adding a 
specific substrate, and determining product formation or substrate utilization 
colorimetrically. The enzyme activity bound is a direct function of the amount of 
antibody bound. 

To measure antigen, a known specific antibody is fixed to the solid 
phase, the test material containing antigen is added, after an incubation the solid 
phase is washed, and a second enzyme-labeled antibody is added. After washing, 
substrate is added/ and enzyme activity is measured colorimetrically, and related 
to antigen concentration. 

Proteases of the invention may be assayed for activity by cleaving a 
substrate which provides detectable cleavage products. As the HCV protease is 
believed to cleave itself from the genomic polyprotein, one can employ this auto- 
catalytic activity both t assay expression of the protein and det rmine activity. 
F r example, if the protease is joined to its fusion partner s that the HCV pro- 
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tease N-tcrminal cleavage signal (Arg-Arg) is included, the expression product will 
cleave itself into fusion partner and active HCV protease. One may then assay 
the products, for example by western blot, to verify that the proteins produced 
tf: correspond in size to the separate fusion partner and protease proteins. It is pres- 
5 ently preferred to employ small peptide p-nitrophenyl esters or methylcoumarins, 
as cleavage may then be followed by spectrophotometric or fluorescent assays* 
Following the method described by E.D. Matayoshi et al, SOfiMfi (1990)242:231- 
■ 35, one may attach a fluorescent label to one end of the substrate and a quench- 
ing molecule to the other end: cleavage is then determined by measuring the 
10 resulting increase in fluorescence. If a suitable enzyme or antigen has been 
employed as the fusion partner, the quantity of protein produced may easily be 
determined. Alternatively, one may exclude the HCV protease N-terminal cleav- 
age signal (preventing self-cleavage) and add a separate cleavage substrate, such 
; ^ a fragment of the HCV NS3 domain including the native processing signal r a 
15 ^ synthetic analog. 

In the absence of this protease activity, the HCV polyprotein should 
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remain in its unprocessed form, and thus render the virus noninfectious. Thus, 
the protease is useful for assaying pharmaceutical agents for control of HCV, as 
tl ! k ; compounds which inhibit the protease activity sufficiently will also inhibit viral 
20 infectivity. Such inhibitors may take the form of organic compounds, particularly 
compounds which mimic the cleavage site of HCV recognized by the protease. 
Three of the putative cleavage sites of the HCV polyprotein have the following 
amino add sequences: 



25 Val-Ser-Aia-Arg-Arg // Gly-Arg-Glu-Ile-Leu-Leu-Gly 
| V Ala-Ile-Leu-Arg-Arg // His-Val-Gly-Pro- 
%. ■■£. Val-Ser-Cys-Gln-Arg // Gly-Tyr- 
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These sites are characterized by the presence of two basic amino 
acids immediately before the cleavage site, and are similar to the cleavage sites 
recognized by other flavivinis proteases. Thus, suitable protease inhibitors may be 
prepared which mimic the basic/basic/small neutral motif of the HCV cleavage 
sites, but substituting a nonlabile linkage for the peptide bond cleaved in the 
natural substrate. Suitable inhibitors include peptide trifluoromethyl ketones, 
peptide boronic acids, peptide o-ketoesters, peptide difluoroketo compounds, pep- 
tide aldehydes, peptide diketones, and the like. For example, the peptide alde- 
hyde N-acetyl-phenylalanyl-glycinaldehyde is a potent inhibitor of the protease 
papain. One may conveniently prepare and assay large mixtures of peptides using 
the methods disclosed in U.S. Patent application Serial No. 7/189,318, filed 2 May 
1988 (published as PCT WO89/10931), incorporated herein by reference. This 
application teaches methods for generating mixtures of peptides up to hexapep- 
tides having all possible amino add sequences, and further teaches assay methods 
for identifying those peptides capable of binding to proteases. 

Other protease inhibitors may be proteins, particularly antibodies 
and antibody derivatives. Recombinant expression systems may be used to gener- 
ate quantities of protease sufficient for production of monoclonal antibodies 
(MAbs) specific for the protease. Suitable antibodies for protease inhibition will 
bind to the protease in a manner reducing or eliminating the enzymatic activity, 
typically by obscuring the active site. Suitable MAbs may be used to generate 
derivatives, such as Fab fragments, chimeric antibodies, altered antibodies, unival- 
ent antibodies, and single domain antibodies, using methods known in the art 

Protease inhibitors are screened using methods of the invention. In 
genera], a substrate is employed which mimics the enzyme's natural substrate, but 
which provides a quantifiable signal when cleaved. The signal is preferably 
detectable by colorimetric r fluorometric means: however, other methods such 
asHPLC r silica gel chromatography, GC-MS, nuclear magnetic resonance, and 
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the like may also be useful. After optimum substrate and enzyme concentrations 
are determined, a candidate protease inhibitor is added to the reaction mixture at 
a range of concentrations. The assay conditions ideally should resemble the con- 
ditions under which the protease is to be inhibited in vivo, i.e., under physiologic 
pH, temperature* ionic strength, etc. Suitable inhibitors will exhibit strong pro- 
tease inhibition at concentrations which do not raise toxic side effects in the sub- 
ject Inhibitors which compete for binding to the protease active site may require 
concentrations equal to or greater than the substrate concentration, while inhib- 
itors capable of binding irreversibly to the protease active site may be added in 
concentrations on the order of the enzyme concentration. 

In a presently preferred embodiment, an inactive protease mutein is 
employed rather than an active enzyme. It has been found that replacing a 
critical residue within the active site of a protease (eg., replacing the active site 
Ser of a serine protease) does not significantly alter the structure of the enzyme, 
and thus preserves the binding specificity. The altered enzyme still recognizes and 
binds to its proper substrate, but fails to effect cleavage. Thus, in one method of 
the invention an inactivated HGV protease is immobilized, and a mixture of can- 
didate inhibitors added. Inhibitors that closely mimic the enzyme's preferred 
recognition sequence will compete more successfully for binding than other candi- 
date inhibitors. The poorly-binding candidates may then be separated, arid the 
identity of the strongly-binding inhibitors determined. For example, HCV prot- 
ease may be prepared substituting Ala for Ser ai (Fig. 1), providing an enzyme 
capable of binding the HGV f protease substrate, but incapable of cleaving it The 
resulting protease mutein is men bound to a solid support for example Sephadex* 
beads, and packed into a column. A mixture of candidate protease inhibitors in 
solution is then passed through the column and fractions collected. The last frac- 
tions to elute will contain the strongest-binding compounds, and provide the pre- 
fewed protease inhmitor candidates. 
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Protease inhibitors may be administered by a variety of methods, 
such as intravenously, orally, intramuscularly, intraperitoneally, bronchially, intra- 
nasally, and so forth. The preferred route of administration will depend upon the 
nature of the inhibitor. Inhibitors prepared as crganic compounds may often be 
administered orally (which is generally preferred) if well absorbed. Protein-based 
inhibitors (such as most antibody derivatives) must generally be administered by 
parenteral routes* 

c faamplcs 

The examples presented below are provided as a further guide to 
the practitioner of ordinary skill in the art, and are not to be construed as limiting 
the invention in any way. 

Example 1 

(Preparation of HCV cDNA) 
A genomic library of HCV cDNA was prepared as described in PCTu 
llir-i- WO89/046699 and USSN 7/456,637. This library, ATCC accession no. 40394, has, 
been deposited as set forth below. 




m 

M 




(Expression of the Polypeptide Encoded in Clone 5-1-1.) 
(A) The HCV polypeptide encoded within clone 5-1-1 (see 
Example 1). was expressed as a fasion f polypeptide with human superoxide dis- 
mutase (SOD). Th^ was aero subclom'njg the done 5-1-1 cDNA insert 

into Kthe expression ( vector pSODGFl ; (ICS. Steimer et al, J Virol (1986) £#9; 
<Eff© 138ilii) as foiibws. ^ 
£ co/t D1210 ceils; These cells; nimned Cfl/5-1-1 in E. caff, were deposited as set 
forth below and have an ATCC accession no. of 67967. 
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First, DNA isolated from pSODCFl was treated with BamHI and 
EcoRI, and the following linker was ligated into the linear DNA created by the 
restriction enzymes: 

GAT CCT GGA ATT CTG ATA AGA CCT TAA GAC TAT TIT AA 

After cloning, the plasmid containing the insert was isolated. 

Plasmid containing the insert was restricted with EcoRI. The HCV 
cDNA insert in clone 5-1-1 was excised with EcoRI, and ligated into this EcoRI 
linearized plasmid DNA. The DNA mixture was used to transform E. coli strain 
D1210 (Sadler et al, J2finfi (1980)3:279)- Recombinants with the 5-1-1 cDNA in 
the correct orientation for expressing the ORF shown in Figure 1 were identified 
by restriction mapping and nucleotide sequencing. 

Recombinant bacteria from one clone were induced to express the 
SOD-HCV^j_j polypeptide by growing the bacteria in the presence of IPTG. 

Three separate expression vectors, pcflAB, pcflCD, and pcflEF 
were created by Iigating three new linkers, AB, CD, and EF to a BamHI-EcoRI 
togment derived by digesting to completion the vector pSODCFl with EcoRI and 
BamHI, followed by treatment with alkaline phosphatase. The linkers were 
created from six oligomers, A, B> C, D, E, and F. Each oligomer was phosphoryl- 
ated by treatment with kinase in the presence of ATP prior to annealing to its 
complementary oligomer. The sequences of the synthetic linkers were the follow- 
ing: V 
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Name — ; DNA Sequence (5* to V) 



GATC CTG AAT TCC TGA TAA 

GAC TTA AGG ACT ATTTTA A 

GATC CGA A'iT CTG TGA TAA 

GCT TAA GAC ACT ATT TTA A 

GATC CTG GAA TTC TGA TAA 

GAC CTT AAG ACT ATT TTA A 





Each of the three linkers destroys the original EcoRI site, and 
creates a new EcoRI site within the linker, but within a different reading frame. 
Thus, the HCV cDNA EcoRI fragments isolated from the clones, when inserted 
into the expression vector, were in three different reading frames. 

the HCV cDNA fragments in the designated gtll clones were 
excised by digestion with EcoRI; each fragment was inserted into pcflAB, 
pcflCD, and pcflEF. These expression constructs were then transformed into 
D1210 K coti cells, the transformants cloned, and polypeptides expressed as 
described in part B below. 

(B) Expression products of the indicated HCV cDNAs were 
tested for antigenicity by direct immunological screening of the colonies, using a 
modification of the method described in Helfman et al. Proc Nat Acad Sri XT&Af 
(1983), ^!. Briefly, the bacteria were p^ 

on ampicfllin plates to give approximately 40 colonies per filter. Colonies were i^i 
replica plated onto nitrocellulose filters, and the replicas were ^ regtwra o^ 
hi the presence of 2 mM IPTG and ampicillin. The bacterial colonies were iysed j 
by suspending the nitrocellulose filters for about 15 to 20 min in an atmosphere 
saturated with CHCI3 !yapjpKj Eadh filter; then was placed in an individual 100 mm 
Petri dish containing 10 mL ^ 7.5, 150 mM NaCI, 5 mM: , 

MgCLg, 3% (w/v) BSA, 40 /ig/mL lys jsytne, and 0.1 fig/mL DNase. The plates 
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were agitated gently for at least 8 hours at room temperature. The filters were 
rinsed in TOST (50 mM Tris HQ, pH 8.0, 150 mM NaCl, 0.005% Tween* 20). 
After incubation, the cell residues were rinsed and incubated for one hour in TBS 
(TBST without Tween^) containing 10% sheep s«rum. The filters were then 
incubated with pretreated sera in TBS from individuals with NANBH, Which 
included 3 chimpanzees; 8 patients with chronic NANBH whose sera were pos- 
itive with respect to antibodies to HCV C100-3 polypeptide (also called C100); 8 
patients with chronic NANBH whose sera were negative for anti-ClOO antibodies; 
a convalescent patient whose serum was negative for anti-ClOO antibodies; and 6 
patients with community-acquired NANBH, including one whose sera was strongly 
positive with respect to anti-ClOO antibodies, and one whose sera was marginally 
positive with respect to anti-ClOO antibodies. The sera, diluted in TBS, was pre- 
treated by preabsorption with hSOD for at least 30 minutes at 37°C After incu- 
bation, the filters were washed twice for 30 min with TBST. The expressed pro- 
teins which bound antibodies in the sera were labeled by incubation for 2 hoiirs 
^with ,25 Mabeled sheep anti-human antibody. After washing, the filters were; ; - 
washed twice for 30 min with TBST, dried, and autoradiographed. ' "t- 



Example 3 
thSC 

pPR322>G2Q0: 



(A) 




^ v determined essentially as described above, except that the cDNA exdsed from 
Sfl these phages were substituted for the cDNA isolated from clone 5-1-1. 
25 Clone C33c was isolated using a liybridization probe fc^ 

lowing sequence: , r;r: . . : — ^ • . . : w ■ - 



014305 



5' ATC AGG ACC GGG GTG AGA ACA ATT ACC ACT 3' 
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The sequence of the HCV cDNA in clone C33c is shown in Figure 8, which also 
shows the amino acids encoded therein. ■ J ' 

Qone 35 was isolated by screening with a synthetic polynucle tide 
having the sequence: Q14306 

5' AAGCCACCG TGT GCG CTA GGG CTC AAG CCC 3' 
Approximately 1 in 50,000 clones hybridized with the probe. The polynucleotide 
and deduced amino acid sequences for G35 are shown in Figure 7. 

Clone C31 is shown in Figure 6, which also shows the amino acids 
encoded therein; A C200 cassette was constructed by ligating together a 718 bp 
10 fragment obtained by digestion of clone C33c DNA with EcoRI and Hinfl, a 179 
: bp fragment obtained by digestion of clone C31 DNA with Hinfl and Bgll 9 and a 

The 



15 



construct of ligated ^ site of pBR322, yield- 3^ 

ing the plasmid pBR322-C20a ' / • 



(B) £Z£±C2&: 

Qone/Tf was isolated using a probe having the sequence: 



mm 



£^£>^ ; v .The^equence of HGV cDNA iri^Bbrie 7f ^n^ the amino acids encoded therein ttre;i 
shoira in Figure 5. . , 
^§ #20 v Qone C20c is isolated using a probe having the following sequence:^ 

ar The sequent ot^ and the amino adds : ! 

25 ; 4^ were then ddned ii^o ^ 

. ibfc^^ ahd'transfc* 

^hbio^^ t:y '"' : ^^4^'^^% 
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(C) <3QQ: 

Gone 8h was isolated using a probe based on the sequence of nuc- 
leotides in clone 33c The nucleotide sequence of the probe was 
*|f G14309 5'-AGA GAC AAC CAT GAG GTC CCC GGT GTT C-3\ 

The sequence of the HCV cDNA in clone 8h, and the amino acids encoded 
therein, are shown in Figure 4. 

Clone C26d is isolated using a probe having the following sequence: 
014310 S'-CTG TTG TGC CCC GCG GCA GCC-3* 

The sequence and amino acid translation of clone C26d is shown in 

Figure 3. 

Clones C26d and C33c (see part A above) were transformed into 
the methylation minus & coli strain GM48. Clone C26d was digested with 
EcoRII and Ddel to provide a 100 bp fragment Gone C33c was digested with 
EcoRII and EcoRI to provide a 700 bp fragment Clone C8h was digested with 
EcoRI and Ddel to provide a 208 bp fragment These three fragments were then 
iigated into the EcoRI site of pBR322> and transformed into E. coli HB101, to 
provide the vector C300. 

(D) Preparation of Full Length Clones: 
A 600 bp fragment was obtained from C7f + C20c by digestion with 

EcoRI and Nael, and Iigated to a 945 bp Nael/EcoRI fragment from GOO, and 
the construct inserted into the EcoRI site of pGEM4Z (commercially available 
from Promega) to form the vector C7fC20cC300. 

C7fG20cC300 was digested with Ndel and EcoRI to provide a 892 
^ b^frag^nt, which was Iigated with a 1160 bp fragment obtained by digesting 
2^51' ^G200 with Ndel and EcoRI. The resulting construct was inserted into the 1 EcoRI 
fj|^ Construction of this vec- 
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Example 4 

(Preparation of £ coli Expression Vectors) 

(A) snsoDsm 

This vector contains a full-length HCV protease coding sequence 
fused to a functional hSOD leader. The vector C7fC20cC30OC2O0 was cleaved 
with EcoRI to provide a 2000 bp fragment, which was then ligated into the EcoRI 
site of plasmid cflCD (Example 2A). The resulting vector encodes amino acids 1- 
151 of hSOD, and amino acids 946-1630 of HCV (numbered from the beginning 
of the polyprbtein, corresponding to amino acids 1-686 in Figure 1). The vector 
was labeled cflSODp600 (sometimes referred to as P600), and was transformed 
into E. coli D1210 cells. These cells, ATCC accession no. 68275, were deposited 
as set forth below. 

(B) £120: 

A truncated SOD-protease fusion polynucleotide was prepared by 
excising a 600 bp EcoRI/Nael fragment from C7f+C20c, blunting the fragment 
with Klendw fragment, ligating the blunted fragment into the Klenow-blunted 
EcoRI site of cflEF (Example 2A). This polynucleotide encodes a fusion protein 
having amino adds 1-151 of hSOD, and amino acids 1-199 of HCV protease. 
(Q £2QQ: 

A longer truncated SOD-protease fusion polynucleotide was pre- 
pared by excising an 892 bp EcoRI/Ndel fragment from C7fC20cG300, blunting 
the fragment with Klenow fragment, ligating the blunted fragment into the 
Klenow-blunted EcoRI site of cflEF. This polynucleotide encodes a fusion pro- 
tein having amino adds 1-151 of hSOD, and amino adds 1-299 of HCV protease. 
(D) . £200: 

A longer truncated SOD-protease fusion polynucleotide was pre- 
pared by excising a 1550 bp EcoRI/EcoRI fragment from C7fC20cC30d, and ligat- 

to form P500. This polynucleotide 
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encodes a fusion protein having amino acids 1-151 of hSOD, and amino acids 946* 
1457 of HCV protease (amino acids 1-513 in Figure 1). 
(E) FLAG/Protease Fusion 

This vector contains a full-length KCV protease coding sequence 
fused to the FLAG sequence, Hopp et al. (1988^ Biotechnology 6: 1204-1210. 
PGR was used to produce a HCV protease gene with special restriction ends for 
cloning ease. Plasmid p500 was digested with EcoRI and Ndel to yield a 900 bp 
fragment. This fragment and two primers were used in a polymerase chain 
reaction to introduce a unique Bglll site at amino acid 1009 and a stop codon 
with a Sail site at amino acid 1262 of the HCV-1, as shown in Figure 17 of WO 
90/11089, published 4 October 1990. The sequence of the primers is as follows: 

5* CCC GAG CAA GAT CTC CCG GCC C 3* 
and 

5' CCC GGC TGC ATA AGC AGT CGA CTT GGA 3' 
After 30 cycles of PGR, the reaction was digested with Bglll and Sail, and the 710 
bp fragment was isolated. This fragment was annealed and ligated to the 
following duplex: 

MetABpTyrLyeABpAspAspAspLyBGlyArgGlu 
CATCCACTACAAAGACGATGACGATAAAGGCCGCGA 

CTGATGTTTCTGCTACTGCTATTTCCGGCCCTCTAG 

The duplex encodes the FLAG sequence, and initiator methionine, and a 5' Ncol; 
restriction site* The resulting Ncol/Sall fragment was ligated into a derivative of 
pCFl. 

This construct is thien transformed into E. coli D1210 cells and expression 
of the protease is induced by the addition of IPTG. 

The FLAG sequence was fused to the HCV protease to facilitate 
purification. A calcium dependent monoclonal antibody, which binds to the 
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FLAG encoded peptide, is used to purify the fusion protein without harsh eluting 
conditions. 
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Example 5 

(£1 co/i Expression of SOD-Protease Fusion Proteins) 

(A) E. coli D1210 cells were transformed with cflSODp600 and grown in 
Liiria broth containing 100 A*g/mL ampicillin to an OD of 03-0*5. IPTG was then 
added to a concentration of 2 mM, and the cells cultured to a final OD of 0.9 to 
13. The cells were then lysed, and the lysate analyzed by Western blot using anti- 
HCV sera, as described in USSN 7/456,637. 

The results indicated the occurrence of cleavage, as no full length product 
(theoretical Mr 93 kDa) was evident on the gel. Bands corresponding to the 
hSOD fusion partner and the separate HCV protease appeared at relative mol- 
ecular weights of about 34, 53, and 66 kDa. The 34 kDa band corresponds to the 
hSOD partner (about 20 kDa) with a portion of the NS3 domain, while the 53 
and 66 kDa bands correspond to HCV protease with varying degrees of (possibly 
bacterial) processing. 

(B) E. coli D1210 cells were transformed with P500 and grown in Luria ... 
broth containing 100/ig/mL ampicillin to an OD of 03-0.5. IPTG was then 
added to a concentration of 2 mM, and the cells cultured to a final OD of 0.8 to 

The cells were then lysed, and the lysate analyzed as described above. 
The results again indicated the occurrence of cleavage, as no full length 
product (theoretical Mr 73 kDa) was evident on the gel. Bands corresponding to 
the hSOD fusion partner and the truncated HCV protease appeared at molecular 
weights of about 34 and 45 kDa, respectively. 

(C) E. coWT>lZ10 cells were transfonhed with ve^ and P190 
im&grow "I, ' . - ; " ; . t>..,' 

.•• " . .. ■ r. •. > ' : ''-.■■"•■<';/■- fCr'.: . .: I - ' "■ / • " v . ■. 

<" ' ' .' - -,- ■■ • W" .\v ■ . ■.. - \- ■ ... • • ^:-: r: - v >| 

• . ...... .... . . . . 
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The results from P300 expression indicated the occurrence of cleavage, as 
no full length product (theoretical Mr 51 kDa) was evident on the gel. A band 
corresponding to the hSOD fusion partner appeared at a relative molecular 
weight of about 34. The corresponding HCV protease band was not visible, as 
this region of the NS3 domain-is not recognized by the sera employed to detect 
the products. However, appearance of the hSOD band at 34 kDa rather than 51 
kDa indicates that cleavage occurred. 

The P190 expression product appeared only as the full (encoded) length 
product without cleavage, forming a band at about 40 kDa, which corresponds to 
the theoretical molecular weight for the uncleaved product. This may indicate 
that the minimum essential sequence for HCV protease extends to the region 
between amino acids 199 and 299. 

Example 6 

(Purification of E. coli Expressed Protease) 
The HCV protease and fragments expressed in Example 5 may be purified 
as follows: 

The bacterial cells in which the polypeptide was expressed are subjected to 
osmotip shock and mechanical disruption, the insoluble fraction containing the 
protease is isolated and subjected to differential extraction with an alkaline-NaCl 
solution, and the polypeptide in the extract purified by chromatography on 
columns of S-Sepharose* and Q-Sepharose*. 

The crude extract resulting from osmotic shock and mechanical disrupt! n 
is prepared by suspending 1 g of the packed cells in 10 mL of a solution con- 
tainuig 0.02 M Tris HQ, pH 7.5, 10 mM EDTA, 20% sucrose; and incubating for 
W rmnutes on ice. The cells are then pelleted by centrifugal 
^•i5^miiat:4°G. After me supernatant is removed, the cell pellets are resuspended 
m^lO rnL of Buffer Al (0.01 M TrisjHQ, pH 7.5, 1 mM EDTA, 14 mM B-mercap^,; 
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toethanol - "BME"), and incubated on ice for 10 minutes. The cells are again 
pelleted at 4,000 x g for 15 minutes at 4°G. After removal of the clear super- 
natant (periplasmic fraction I), the cell pellets are resuspended in Buffer Al, incu- 
bated on ice for 10 minutes, and again centrifuged at 4,000 x g for 15 minutes at 
5 4°G. The clear supernatant (periplasmic fraction II) is removed, and the cell 
pellet resuspended in 5 mL of Buffer T2 (0.02 M Tris HQ, pH 73, 14 mM BME, 
1 mM EDTA, 1 mM PMSF). In order to disrupt the cells, the suspension (5 mL) 
and 75 mL of pyno-mill lead-free acid washed glass beads (0.10-0,15 mm diam- 
eter) (available from Glen-Mills, Inc.) are placed in a Falcon tube and vortexed at 
10 top speed for two minute s, followed by cooling for at least 2 min on ice. The 
vortexing-cooling procedure is repeated another four times. After vortexing, the 
slurry is filtered through a sintered glass funnel using low suction, the glass beads 
washed twice with Buffer A2, and the filtrate and washes combined. 

The insoluble fraction of the crude extract is collected by centrifugation at 
15 20,000 x g for 15 min at 4°C, washed twice with 10 mL Buffer A2, and fesus- 
• pended in 5 mL of MILLI-Q water. 

A fraction containing the HCV protease is isolated from the insoluble 
material by adding to the suspension NaOH (2 M) and NaCl (2 M) to yield a 
^ final ^ mM each, vortexing the mixture for I minute, centrifuging 

20 it 20,000 x g for 20 min at 4*C, and retaining the supernatant i 
The partially purified protease is then purified by SDS-PAGE. The pro- 
tease may be identified by western blot, and the band excised from the gel. Th 
^ protease is then eluted from the band, and analyzed to confirm its amino acid 

sequence. ^ sequences may be analyzed using an automated amino add 

25 sequencer, while C-terminal sequences may be analyzed by automated amin acid 
f ^ s 



WO 91/15575 



PCT/US91/02210 



10 



15 



36 



Example 7 
(Preparation of Yeast Expression Vector) 
(A) P65Q (SOD/Pratease Fusion! 

This vector contains HCV sequence, which includes the wild-type full- 
length HCV protease coding sequence, fused at the 5' end to a SOD coding 
sequence. Two fragments, a 441 bp EcoRI/Bglll fragment from clone lib and a 
1471 bp Bglll/EcoRI fragment from expression vector P500, were used to 
reconstruct a wild-type, full-length HCV protease coding sequence. These two 
fragments were ligated together with an EcoRI digested pS356 vector to produce 
an expression cassette. The expression cassette encodes the ADH2/GAPDH 
hybrid yeast promoter, human SOD, the HCV protease, and a G APDH 
transcription terminator. The resulting vector was digested with BamHI and a 
4052 bp fragment was isolated. This fragment was ligated to the BamHI digested 
pAB24 vector to produce p650. p650 expresses a polyprotein containing, from its 
amino terminal end, amino acids 1-154 of hSOD, an oligopeptide -Asn-Leu-Gly- 
Ile-Arg- , and amino adds 819 to 1458 of HCV-1, as shown in Figure 17 of WO 
90/11089, published 4 October 1990. 

Clone lib was isolated from the genomic libraiy of HCV cDNA, ATCC 
accession no. 40394, as described above in Example 3A, using a hybridizati n 
probe haying the following sequence: 

This procedure is also described in EPO Pub. No. 318 216, Example TV A.17. 

Thelvettbr ^ is a pBR322 derivative, contains tlie 

i^H2/GAPD upstream of the human superoxide • 

i5|^y dii^toe?g^ an adaptor, and a downstream yeast effective transcription 
terminator. A simile ejpre&ib^^ these; cbn^ 

^ superoxide dismutase gene is described in Cousens et al. (1987) jQfiilfi 61: 265, and 
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in amending af>plicatioh EPOJ196,056; published October 1, 1986. pS3EF, 
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however, differs from that in Cousens et al. in that the heterologous proinsulin 
gene and the immunoglobulin hinge are deleted, and Gln^ of SOD is followed by 
an 

adaptor sequence which contains an EcoRI site. The sequence of the adaptor is: 

5* AAT TTG GGA ATT CCA TAA TTA ATT AAG 3' 

3» AC OCT TAA GCT ATT AAT TAA TTC AGCT 5' \ J 

The EcoRI site facilitates the insertion of heterologous sequences. Once inserted 
into pS3EF, a SOD fusion is expressed which contains an oligopeptide that links 
SOD to the heterologous sequences. pS3EF is exactly the same as pS356 except 
that pS356 contains a different adaptor. The sequence of the adaptor is shown 
below: 

5* AAT TTG CCA ATT CCA TAA TGA G 3' 

3 • AC CCT TAA CGT ATT ACT CAG CT 5 f 

pS356, ATCC accession no. 67683, is deposited as set forth below. 

Plasmid pAB24 is a yeast shuttle vector, which contains pBR322 
sequences, the complete 2/t sequence for DNA replication in yeast (Broach (1981) 
in: Molecular Biology of the Yeast Saccharomvces. Vol. 1, p. 445, Cold spring 
Harbor Press.) and the yeast LEU M gene derived from plasmid pCl/1, described 
in EPO Pub. No. 116 201. Plasmid pAB24 was constructed by digesting YEp24 
with EcoRI and re-ligating the vector to remove the partial 2 micron sequences. 
The resulting plasmid, YEp24deItaRI, was linearized with Clal and ligated with 
the complete 2 micron plasmid which had been linearized with Clal. The 
resulting plasmid, pCBou, was then digested with Xbal, and the 860S bp vector 
fragment was gel isolated. This isolated Xbal fragment was ligated with a 4460 
bp Xbal fragment containing the LEU M gene isolated from pCl/1; the orientation 
of LEU 3 *! gene is in the same direction as the URA3 gene. 

5. cerevisae, 2150-2-3 (pAB24-GAP-env2), accession no. 20827, is 
deposited with the American Type Culture Collection as set f rth below. The 
plasmid pAB24-GAP-env2 can be recovered from the yeast cells by known 
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techniques. The GAP-env2 expression cassette can be removed by digesting 
pAB24-GAP-env2 with BamHI. pAB24 is recovered by religating the vector 
without the BamHI insert. 

Example 8 

(Yeast Expression of SOD-Protease Fusion Protein) / 
^ H ,. p650 was transformed in S. cerevisae strain JSC310, Mata, leu2, 

ura3-52, prbl-1122, pep4-3, prcl-407, dr°: DM15 (g418 resistance). The 
transformation is as described by Hinnen et al. (1Q7«) Pme Natl Acad Sci USA 
I 75: 1929. The transformed cells were selected on ura- plates with 8% glucose. 
The plates were incubated at 30°C for 4-5 days. The tranformants were further 
selected on leu- plates with 8% glucose putatively for high numbers of the p650 
plasmid. Colonies from the leu- plates were inoculated into leu- medium with 3% 
glucose. These cultures were shaken at 30°C for 2 days and then diluted 1/20 
into YEPD medium with 2% glucose and shaken for 2 more days at 30°C. 

S. cerevisae JSC310 contains DM15 DNA, described in EPO Pub. 
No; 340 986, published 8 NOvember 1989. This DM15 DNA enhances ADHZ 
regulated expression of heterologous proteins. pDM15, accession no. 40453, is 

as set forth below. 




(Yeast Ubiquitin Expression of Mature HCV Protease) 
MatureHCV protease is prepared by cleaving vector 

a 2 Kb coding sequence, wid inserting 
me sequence with the appropriate linkers into a ubiquitin expression vector, such 
as that described in WO 88/0240&|published 7 ^|fi^l988, or USSN 7/390,599 , 
filed S August 1989, incorporated herein by reference. Mature HCV pr tease is 
^recovered upon expression of the vector in suitable hosts,. particularly yeast 
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Example 10 
(Preparation of an In- Vitro Expression Vector) 
(A) pGEM»-3Z/Yellow Fever Leader Vector y 

Four synthetic DNA fragments were annealed and ligated** 
together to create a EcoRI/SacI Yellow Fever leader, which was ligated to a 
EcoRI/SacI digested pGEM 9 -3Z vector from Promega*. The sequence of the 
four fragments are listed below: 
YFK-1: ■ 

5' AAT TCG TAA ATC GTG TGT GCT AAT TGA GGT GCA TTG GTC 



.,...,^i;HYFK-2: 



-3 



*FK-3: 
5* 



1 
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DNA sequence of p6000 is identical to nucleotide -275 to nucleotide 6372 of 
Figure 17 of WO 90/11089, published 4 October 1990. p6000 was digested with 
PvuII, and from the digest, a 2,864 bp fragment was isolated. This 2,864 bp 
fragment was ligated to the prepared pGEM*-3Z/Yellow Fever leader vector 
fragment, described above. 




Example 11 

(In-Vitro Expression of HGV Protease) 
(A) Transcription 

10 The pGEM«-3Z/Yellow Fever leader/PvuII vector was linearized 

^with Xbal and transcribed using the materials and protocols from Prorhega's 
^ Riboprobe* Gemini II Core system. 
P^:; (BV Translation 

IV--' The RNA produced by the above protocol was translated using 

■15;. :^s i Promega's rabbit fetijeuioc^:tysfrte,tiiiiiius' methioniiie, canine pancreatic s 
>^^ospm^ necessary materials and instructions 

from Promega. ■ ■ 
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20 VDeixwited Bioln^cal MatPriakr; f 

^ deposited with the American Tj^e | 

Welfare ^ v 




Name -r i : Lc'- : 



Deposit Date 
23 Mar 1990 



Gfl/5-i-i in £: c^ D1210t ; ^ 11 May 1989 
Bacteriophage ^^gill^NAW" 01 Dec l987 



AcceMiohVNri: 
68275 

67967? 

40394 ■ 
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K coli HB101, pS356 

plasmid DNA, pDM15 

S. cerevisae, 2150-2-3 
(pAB24-GAP-env2) 



29 Apr 1988 67683 
OS May 1988 40453 
23 Dec 1986 20827 



The above materials have been deposited with the ATCC under the 
accession numbers indicated. These deposits will be maintained under the terms 
of the Budapest Treaty on the International Recognition of the Deposit of Micro- 
organisms for purposes of Patent Procedure. These deposits are provided as a; 
convenience to those of skill in the art, and are not an admission that a deposit is 
required under 35 U.S.C §112. The polynucleotide sequences contained in the 
deposited materials, as well as the amino acid sequence of the polypeptides 
encoded thereby, afe incorporated herein by reference and are controlling in the 
fiw^ of ronflict with the sequences described herein. A license may be 
required to make, use or sell the deposited materials, and no such license is 
granted hereby. 
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WHAT IS CLAIMED: 

1. A composition comprising a purified proteolytic polypeptide 
derived from Hepatitis C virus. 

' ... ■ " ■ 

2. The composition of claim 1, wherein said polypeptide has a par- 
tial internal sequence substantially as follows: 

•••Trp Thr Val Tyr His Gly Ala Gly Thr Arg Thr^». 

3. The composition of claim 1, wherein said polypeptide has a 
partial internal sequence substantially as follows: 

•••Leu Lys Gly Ser Ser Gly Gly Pro Leu"*. 



4. 



• : "-^;^| 

> ^;*^:V*?.r 



stantially the partial internal sequence: 
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The composition of claim 1, wherein said polypeptide has 



substantially the amino acid sequence shown in Figure 1, 
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6. A fusion protein, comprising: 

a suitable fusion partner, fused to 

a proteolytic polypeptide derived from Hepatitis C virus. 



7. The fusion protein of claim 6, wherein said fusion partner 

comprises human superoxide dismutase. 



8. The fusion protein of claim 6, wherein said proteolytic 

polypeptide has a partial internal sequence substantially as follows: 
•••Trp Thr Val Tyr His Gly Ala Gly Thr Arg Thr»". 



9. The fusion protein of claim 6, wherein said proteolytic 

polypeptide has a partial internal sequence substantially as follows: 
•••Leu Lys Gly Ser Ser Gly Gly Pro Leu*«». 



10. The fusion protein of claim 6, wherein said proteolytic 

polypeptide has as a partial internal sequence: 

Arg Arg Gly Arg Glu He Leu Leu Gly Pro Ala Asp Gly Met Val 
Ser Lys Gly Trp Arg Leu Leu Ala Pro lie Thr Ala Tyr Ala Gin Gin 
Thr Arg Gly Leu Leu Gly Cys He lie Thr Ser Leu Thr Gly Arg Asp 
Lys Asn Gin Val Glu Gly Glu Val Gin He Val Ser Thr Ala Ala Gin 
Thr Phe Leu Ala Thr Cys He Asn Gly Val Cys Trp Thr Val Tyr His 
Gly Ala Gly Thr Arg Thr He Ala Ser Pro Lys Gly Pro Val He Gin 
Met Tyr Thr Asn Val Asp Gin Asp Leu Val Gly Trp Pro Ala Pro 
Gin Gly Ser Arg Ser Leu Thr Pro Cys Thr Cys Gly Ser Ser Asp Leu 
Tyr Leu Val Thr Arg His Ala Asp Val He Pro Val Arg Arg Arg Gly 
Asp Ser Arg Gly Ser Leu Leu Ser Pro Arg Pro lie Ser Tyr Leu Lys 
Gly Ser Ser Gly Gly Pro Leu Leu Cys Pro Ala Gly His Ala Val Gly 
He Phe Arg Ala Ala Val Cys Thr Arg Gly Val Ala Lys Ala Val Asp 
Phe He Pro Val Glu Asn Leu Glu Thr Thr Met Arg. 



11. The fusion protein of claim 6, wherein said fusion partner is 
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12. A composition comprising a polynucleotide which encodes only 
the HCV protease or an active HCV protease analog. 



13. The composition of claim 12, wherein said polynucleotide 
5 encodes the HCV protease of Figure 1. 



14. A composition comprising a polynucleotide which encodes a 
fusion protein comprising: 

HCV protease or HCV protease analog; and 
10 a fusion partner. 



15. The composition of claim 14, wherein said fusion partner is 
selected from the group consisting of hSOD, yeast o-factor, ILr2S, ubiquitin, B- 
galactosidase, fi-lactamase, horseradish peroxidase, glucose oxidase, and urease. 

15 

16. The composition of claim 14, wherein said HCV protease or 
HCV protease analog comprises a polypeptide having substantially the following 
sequence: 

Arg Arg Gly Arg Giu He Leu Leu Gly Pro Ala Asp Gly Met Val 
20 Ser Lys Gly Trp Arg Leu Leu Ala Pro lie Thr Ala Tyr Ala Gin Gin 

Thr Arg Gly Leu Leu Gly Cys He He Thr Ser Leu Thr Gly Arg Asp 
Lys Asn Gin Val Glu Gly Glu Val Gin He Val Ser Thr Ala Ala Gin 
Thr Phe Leu Ala Thr Cys He Asn Gly Val Cys Trp Thr Val Tyr His 
Gly Ala Gly Thr Arg Thr He Ala Ser Pro Lys Gly Pro Val lie Gin 
25 Met Tyr Thr Asn Val Asp Gin Asp Leu Val Gly Trp Pro Ala Pro 

Gin Gly Ser Arg Ser Leu Thr Pro Cys Thr Cys Gly Ser Ser Asp Leu 
Tyr Leu Val Thr Arg His Ala Asp Val lie Pro Val Arg Arg Arg Gly 
Asp Ser Arg Gly Ser Leu Leu Ser Pro Arg Pro He Ser Tyr Leu Lys 
Gly Ser Ser Gly Gly Pro Leu Leu Cys Pro Ala Gly His Ala Val Gly 
30 He Phe Arg Ala Ala Val Cys Thr Arg Gly Val Ala Lys Ala Val Asp 

Phe lie Pro Val Glu Asn Leu Glu Thr Thr Met Arg. 
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17. The composition of claim 14, wherein said HCV protease or 

analog comprises a polypeptide having substantially the sequence: 

Gly Thr Tyr Val Tyr Asn His Leu Thr Pro Leu Arg Asp Trp Ala 
His Asn Gly Leu Arg Asp Leu Ala Val AI* Val Glu Pro Val Val 
5 Phe Ser Gin Met Glu Thr Lys Leu He Thr Trp Gly Ala Asp Thr 

Ala Ala Cys Gly Asp He lie Asn Gly Leu Pro Val Ser Ala Arg Arg 
Gly Arg Glu He Leu Leu Gly Pro Ala Asp Gly Met Val Ser Lys Gly 
Tip Arg Leu Leu Ala Pro He Thr Ala Tyr Ala Gin Gin Thr Arg Gly 
Leu Leu Gly Cys lie He Thr Ser Leu Thr Gly Arg Asp Lys Asn Gin 

10 Val Glu Gly Glu Val Gin He Val Ser Thr Ala Ala Gin Thr Phe Leu 

Ala Thr Cys He He Asn Gly Val Cys Trp Thr Val Tyr His Gly Ala 
Gly Thr Arg Thr He Ala Ser Pro Lys Gly Pro Val He Gin Met Tyr 
Thr Asn Val Asp Gin Asp Leu Val Gly Trp Pro Ala Ser Gin Gly 
Thr Arg Ser Leu Thr Pro Cys Thr Cys Gly Ser Ser Asp Leu Tyr Leu 

15 Val Thr Arg His Ala Asp Val He Pro Val Arg Arg Arg Gly Asp Ser 

Arg Gly Ser Leu Leu Ser Pro Arg Pro He Ser Tyr Leu Lys Gly Ser 
Ser Gly Gly Pro Leu Leu Cys Pro Ala Gly His Ala Val Gly He Phe 
Arg Ala Ala Val Cys Thr Arg Gly Val Ala Lys Ala Val Asp Phe He 
Pro Val Glu Asn Leu Glu Thr Thr Met Arg Ser Pro Val Phe Thr 

20 Asp Asn Ser Ser Pro Pro Val Val Pro Gin Ser Phe Gin Val Ala His 

Leu His Ala Pro Thr Gly Ser Gly Lys Ser Thr Lys Val Pro Ala Ala. 

18. The composition of claim 14, wherein said polypeptide has 
substantially the sequence: 

25 Gly Thr Tyr Val Tyr Asn His Leu Thr Pro Leu Arg Asp Trp Ala 

His Asn Gly Leu Arg Asp Leu Ala Val Ala Val Glu Pro Val Val 
Phe Ser Gin Met Glu Thr Lys Leu He Thr Trp Gly Ala Asp Thr 
Ala Ala Cys Gly Asp He He Asn Gly Leu Pro Val Ser Ala Arg Arg 
Gly Arg Glu He Leu Leu Gly Pro Ala Asp Gly Met Val Ser Lys Gly 

30 Trp Arg Leu Leu Ala Pro lie Thr Ala Tyr Ala Gin Gin Thr Arg Gly 

Leu Leu Gly Cys lie lie Thr Ser Leu Thr Gly Arg Asp Lys Asn Gin 
Val Glu Gly Glu Val Gin lie Val Ser Thr Ala Ala Gin Thr Phe Leu 
Ala Thr Cys He He Asn Gly Val Cys Trp Tnr Val Tyr His Gly Ala 
Gly Thr Arg Thr He Ala Ser Pro Lys Gly Pro Val He Gin Met Tyr 

35 Thr Asn Val Asp Gin Asp Leu Val Gly Trp Pro Ala Ser Gin Gly 

Thr Arg Ser Leu Thr Pro Cys Thr Cys Gly Ser Ser Asp Leu Tyr Leu 
Val Thr Arg His Ala Asp Val He Pro Val Arg. 
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19. A method for assaying compounds for activity against hepatitis C 
virus, comprising: 

providing an active hepatitis C virus protease; 
contacting said protease with a compound capable of inhibiting serine 
5 protease activity; and 

measuring inhibition of the proteolytic activity of said hepatitis C virus 
protease. 

20. An expression vector for producing HCV protease or HCV 
10 protease analogs in a host cell, which vector comprises: 

a polynucleotide encoding HCV protease or an HCV ai&log; 
transcriptional and translational regulatory sequences functional in said 
host cell operably linked to said HCV protease-encoding polynucleotide; and 
a selectable marker. 

21. The vector of claim 20, which further comprises a sequence 
encoding a fusion partner, linked to said HCV protease-encoding polynucleotide 
to form a fusion protein upon expression. 

20 22. The vector of claim 21, wherein said fusion partner is selected 

from the group consisting of hSOD, yeast a-factor, IL-2S, ubiquitin, G- 
galactosidase, 8-lactamase, horseradish peroxidase, glucose oxidase, and urease. 



0.:; 



23. The vector of claim 22, wherein said fusion partner is selected 
IjSf - ' ^ 25 from the group consisting of ubiquitin, hSOD, and yeast a-factor. 
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?4, The vector of claim 20, wherein said HCV protease-encoding 
polynucleotide encodes a polypeptide having the substantially the following 
sequence: 

Arg Arg Gly Arg GIu He Leu Leu Gly Pro Ala Asp Gly Met Val 
Ser Lys Gly Trp Arg Leu Leu Ala Pro He Thr Ala Tyr Ala Gin Gin 
Thr Arg Gly Leu Leu Gly Cys lie lie Thr Ser Leu Thr Gly Arg Asp 
Lys Asn Gin Val Glu Gly Glu Val Gin lie Val Ser Thr Ala Ala Gin 
Thr Phe Leu Ala Thr Cys lie Asn Gly Val Cys Trp Thr Val Tyr His 
Gly Ala Gly Thr Arg Thr He Ala Ser Pro Lys Gly Pro Val He Gin 
Met Tyr Thr Asn Val Asp Gin Asp Leu Val Gly Trp Pro Ala Pro 
Gin Gly Ser Arg Ser Leu Thr Pro Cys Thr Cys Gly Ser Ser Asp Leu 
Tyr Leu Val Thr Arg His Ala Asp Val He Pro Val Arg Arg Arg Gly 
Asp Ser Arg Gly Ser Leu Leu Ser Pro Arg Pro He Ser Tyr Leu Lys 
Gly Ser Ser Gly Gly Pro Leu Leu Cys Pro Ala Gly His Ala Val Gly 
He Phe Arg Ala Ala Val Cys Thr Arg Gly Val Ala Lys Ala. Val Asp 
Phe He Pro Val Glu Asn Leu Glu Thr Thr Met Arg. 



25. The vector of claim 20, wherein said HCV protease-encoding 
polynucleotide encodes a polypeptide having the substantially the following 
sequence: 

Gly Thr Tyr Val Tyr Asn His Leu Thr Pro Leu Arg Asp Trp Ala 
His Asn Gly Leu Arg Asp Leu Ala Val Ala Val Glu Pro Val Val 
Phe Ser Gin Met Glu Thr Lys Leu He Thr Trp Gly Ala Asp Thr 
Ala Ala Cys Gly Asp He He Asn Gly Leu Pro Val Ser Ala Arg Arg 
Gly Arg Glu He Leu Leu Gly Pro Ala Asp Gly Met Val Ser Lys Gly 
Trp Arg Leu Leu Ala Pro He Thr Ala Tyr Ala Gin Gin Thr Arg Gly 
Leu Leu Gly Cys lie He Thr Ser Leu Thr Gly Arg Asp Lys Asn Gin 
Val Glu Gly Glu Val Gin He Val Ser Thr Ala Ala Gin Thr Phe Leu 
Ala Thr Cys He He Asn Gly Val Cys Trp Thr Val Tyr His Gly Ala 
Gly Thr Arg Thr He Ala Ser Pro Lys Gly Pro Val He Gin Met Tyr 
Thr Asn Val Asp Gin Asp Leu Val Gly Trp Pro Ala Ser Gin Gly 
Thr Arg Ser Leu Thr Pro Cys Thr Cys Gly Ser Ser Asp Leu Tyr Leu 
Val Thr Arg His Ala Asp Val He Pro Val Arg. 
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26. The vector of claim 20, wherein said HCV protease-ehcoding 
polynucleotide encodes a polypeptide having the substantially the following 
sequence: 

Gly Thr Tyr Val Tyr Asn His Leu Thr Pro Leu Arg Asp Trp Ala 
His Asn Gly Leu Arg Asp Leu Ala Val Ala Val Glu Pro Val Val 
Phe Ser Gin Met Glu Thr Lys Leu He Thr Trp Gly Ala Asp Thr 
Ala Ala Cys Gly Asp He He Asn Gly Leu Pro Val Ser Ala Arg Arg 
Gly Arg Glu He Leu Leu Gly Pro Ala Asp Gly Met Val Ser Lys Gly 
Trp Arg Leu Leu Ala Pro He Thr Ala Tyr Ala Gin Gin Urn Arg Sly 
Leu Leu Gly Cys He lie Thr Ser Leu Thr Gly Arg Asp Lys Asn Gin 
Val Glu Gly Glu Val Gin He Val Ser Thr Ala Ala Gin Thr Phe Leu 
Ala Thr Cys He lie Asn Gly Val Cys Trp Thr Val Tyr His Gly Ala 
Gly Thr Arg Thr lie Ala Ser Pro Lys Gly Pro Val He Gin Met Tyr 
Thr Asn Val Asp Gin Asp Leu Val Gly Trp Pro Ala Ser Gin Gly 
Thr Arg Ser Leu Thr Pro Cys Thr Cys Gly Ser Ser Asp Leii Tyr Leu 
Val Thr Arg His Ala Asp Val lie Pro Val Arg Arg Arg Gly Asp Ser 
Arg Gly Ser Leu Leu Ser Pro Arg Pro lie Ser Tyr Leu Lys GlySer 
Ser Gly Gly Pro Leu Leu Cys Pro Ala Gly His Ala Val Gly He Phe 
Arg Ala Ala Val Cys Thr Arg Gly Val Ala Lys Ala Val Asp Phe lie 
Pro Val Glu Asn Leu Glu Thr Thr Met Arg Ser Pro Val Phe Thr 
Asp Asn Ser Ser Pro Pro Val Val Pro Gin Ser Phe Gin Val Ala His 
Leu His Ala Pro Thr Gly Ser Gly Lys Ser Thr Lys Val Pro Ala Ala. 
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Q1429A 

15 10 

Gly Thr Tyr Val Tyr Asn His Leu Thr Pro Leu Arg Asp Trp 

. ATT CGG GGC ACC TAT GTT TAT AAC CAT CTC ACT CCT CTT CGG GAC TGG 

TAA GCC CCG TGG ATA CAA ATA TTG GTA GAG TGA GGA GAA GCC CTG ACC 

15 20 25 30 

Ala His Asn Gly Leu Arg Asp Leu Ala Val Ala Val Glu Pro Val Val 

^GCG CAC AAC GGC TTG CGA GAT CTG GCC GTG GCT GTA GAG CCA GTC GTC 
CGC GTG TTG CCG AAC GCT CTA GAC CGG CAC 2GA CAT CTC GGT CAG CAG 

35 40 45 

Phe Ser Gin Met Glu Thr Lys Leu lie Thr Trp Gly Ala Asp Thr Ala 
TTC TCC CAA ATG GAG ACC AAG CTC ATC ACG TGG GGG GCA GAT ACC GCC 
AAG AGG GTT TAC CTC TGG TTC GAG TAG TGC ACC CCC CGT CTA TGG CGG 

50 55 60 

Ala Cys Gly Asp He He Asn Gly Leu Pro Val Ser Ala Arg Arg Gly 
GCG TGC GGT GAC ATC ATC AAn GGC TTG CCT GTT TCC GCC CGC AGG GGC 
CGC ACG CCA CTG TAG TAG TTG CCG AAC GGA CAA AGG CGG GCG TCC CCG 

65 70 75 

Arg Glu He Leu Leu Gly Pro Ala Asp Gly Met Val Ser Lys Gly Trp 
CGG GAG ATA CTG CTC GGG CCA GCC GAT GGA ATG GTC TCC AAG GGT TGG 
GCC CTC TAT GAC GAG CCC GGT CGG CTA CCT TAC CAG AGG TTC CCA ACC 



80 

Arg Leu Leu Ala 
AGG TTG CTG GCG 
TCC AAC GAC CGC 



85 

Pro He Thr Ala 
CCC ATC ACG GCG 
GGG TAG TGC CGC 



90 

Tyr Ala Gin Gin 
TAC GCC CAG CAG 
ATG CGG GTC GTC 



Thr Arg Gly Leu 
ACA AGG GGC CTC 
TGT TCC CCG GAG 



95 100 
Leu Gly Cys He He Thr Ser Leu 
CTA GGG TGC ATA ATC ACC AGC CTA 
GAT CCC ACG TAT TAG TGG TCG GAT 



105 110 
Thr Gly Arg Asp Lys Asn Gin Val 
ACT GGC CGG GAC AAA AAC CAA GTG 
TGA CCG GCC CTG TTT TTG GTT CAC 



Glu Gly Glu Val 
-GAG GGT GAG GTC 
CTC CCA CTC CAG 



115 

Gin He Val Ser 
CAG ATT GTG TCA 
GTC TAA CAC AGT 



120 

Thr Ala Ala Gin 
ACT GCT GCC CAA 
TGA CGA CGG GTT 



125 

Thr Phe Leu Ala 
ACC TTC CTG GCA - 
TGG AAG GAC CGT 



Figure 0. 
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130 

Thr Cys He He Asn 
ACG TGC ATC ATC AAT 
TGC ACG TAG TAG TTA 



145 

Thr Arg Thr He Ala 
ACG AGG ACC ATC GCG 
TGC TCC TGG TAG CGC 



160 

Asn Val Asp Gin Asp 
AAT GTA GAC CAA GAC 
TTA CAT CTG GTT CTG 



175 

Ser Leu Thr Pro cys 
TCA TTG ACA CCC TGC 
AGT AAC TGT GGG ACG 



195 

Arg His Ala Asp Val 
■AGG CAC GCC GAT GTC 
TCC GTG CGG CTA CAG 



210 

Ser Leu Leu Ser Pro 
AGC CTG CTG TCG CCC 
TCG GAC GAC AGC GGG 



225 

Gly Pro Leu Leu Cys 
GGT CCG CTG TTG TGC 
CCA GGC GAC AAC ACG 



240 

Ala Val cys Thr Arg 
GCG GTG TGC ACC CGT 
CGC CAC ACG TGG GCA 



2/21 



Gly 

GGG 
CCC 


Val 

GTG 
CAC 


cys 

TGC 
ACG 


135 

Trp 

TGG 
ACC 


Thr 

ACT 
TGA 


Ser 
TCA 
AGT 


Pro 
CCC 
GGG 


150 
Lys 
AAG 
TTC 


Gly 
GGT 
CCA 


Pro 
CCT 
GGA 


Leu 
CTT 
GAA 


165 
Val 
GTG 
CAC 


Gly 
GGC 
CCG 


Trp 
TGG 
ACC 


Pro 
CCC 
GGG 


180 

Thr 

ACT 
TGA 


Cys 
TGC 
ACG 


Gly 
GGC 
CCG 


Ser 
TCC 
AGG 


Ser 
TCG 
AGC 


He 
ATT 
TAA 


Pro 
CCC 
GGG 


Val 
GTG 
CAC 


Arg 
CGC 
GCG 


200 
Arg 
CGG 
GCC 








Nael 


Arg 
CGG 
GCC 


Pro 
CCC 
GGG 


He 
ATT 
TAA 


215 
Ser 
TCC 
AGG 


Tyr 
TAC 
ATG 


Pro 
CCC 
GGG 


Ala 

GCG 
CGC 


230 

Gly His 
GGG CAC 
CCC GTG 


Ala 

GCC 
CGG 


Gly 
GGA 
CCT 


245 
Val 
GTG 
CAC 


Ala 

GCT 
CGA 


Lys 
AAG 
TTC 


Ala 

GCG 
CGC 



PCT/US9 1/022 10 



140 

Val Tyr His Gly Ala Gly 

GTC TAC CAC GGG GCC GGA 
CAG ATG GTG CCC CGG CCT 



155 

val He Gin Met Tyr Thr 
GTC ATC CAG ATG TAT ACC 
CAG TAG GTC TAC ATA TGG 



170 

Ala Ser Gin Gly Thr Arg 
GCT TCG CAA GGT ACC CGC 
CGA AGC GTT CCA TGG GCG 



185 190 
Asp Leu Tyr Leu Val Thr 
GAC CTT TAC CTG GTC ACG 
CTG GAA ATG GAC CAG TGC 



205 

Arg Gly Asp Ser Arg Gly 
CGG GGT GAT. AGC AGG GGC 
GCC CCA CTA TCG TCC CCG 



220 

Leu Lys Gly Ser Ser Gly 
TTG AAA GGC TCC TCG GGG 
AAC TTT CCG AGG AGC CCC 



235 

Val Gly He Phe Arg Ala 
GTG GGC ATA TTT AGG GCC 
CAC CCG TAT AAA TCC CGG, 



250 

Val Asp Phe lie Pro Val 
GTG GAC TTT ATC CCT GTG 
CAC CTG AAA TAG GGA CAC 



Fi-g-iaare x (continued) 
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255 260 

Glu Asn Leu Glu Thr Thr Met Arg 
GAG AAC CTA GAG ACA ACC ATG AGG 
CTC TTG GAT CTC TGT TGG TAC TCC 



265 270 
Ser Pro Val Phe Thr Asp Asn S r 
TCC CCG GTG TTC ACG GAT AAC TCC 
AGG GGC CAC AAG TGC CTA TTG AGG 



Ser Pro Pro Val 
TCT CCA CCA GTA 
AGA GGT GGT CAT 



275 








Val 


Pro 


Gin 


Ser 


GTG 


CCC 


CAG 


AGC 


CAC 


GGG 


GTC 


TCG 





280 






Phe 


Gin 


Val 


Ala 


TTC 


CAG 


GTG 


GCT 


AAG 


GTC 


CAC 


CGA 



285 

His Leu His Ala 
CAC CTC CAT GCT 
GTG GAG GTA CGA 



290 



Pro 


Thr Gly Ser Gly 


Lys 


Ser 


Thr 


CCC 


ACA GGC AGC 


GGC 


AAA 


AGC 


ACC 


GGG 


TGT CCG TCG 


CCG 


TTT 


TCG 


TGG 




305 








310 


Gin 


Gly Tyr Lys 


Val 


Leu 


val 


Leu 


CAG 


GGC TAT AAG 


GTG 


CTA 


GTA 


CTC 


GTC 


CCG ATA TTC 


CAC 


GAT 


CAT 


GAG 




320 






325 




Gly 


Phe Gly Ala Tyr 


Met 


Ser 


Lys 


GGC 


TTT GGT GCT 


TAC 


ATG 


TCC 


AAG 


CCG 


AAA CCA CGA 


ATG 


TAC 


AGG 


TTC 



295 










300 






Lys 


val 


Pro 


Ala 


Ala 


Tyr Ala Ala 


AAG 


GTC 


CCG 


GCT 


GCA 


TAT 


GCA 


GCT 


TTC 


CAG 


GGC 


CGA 


CGT 


ATA 


CGT 


CGA 










Wdel 














315 








Asn 


Pro 


Ser 


Val 


Ala 


Ala 


Thr 


L U 


AAC 


CCC 


TCT 


GTT 


GCT 


GCA 


ACA 


CTG 


TTG 


GGG 


AGA 


CAA 


CGA 


CGT 


TGT 


GAC 








330 










Ala 


His 


Gly 


He 


Asp 


Pro 


Asn 


He 


GCT 


CAT 


GGG 


ATC 


GAT 


CCT 


AAC 


ATC 


CGA 


GTA 


CCC 


TAG 


CTA 


GGA 


TTG 


TAG 



335 

Arg Thr Gly Val 
AGG ACC GGG GTG 
TCC TGG CCC CAC 



340 

Arg Thr He Thr 

AGA ACA ATT ACC 
TCT TGT TAA TGG 







345 




Thr 


Gly 


Ser 


Pro 


ACT 


GGC 


AGC 


CCC 


TGA 


CCG 


TCG 


GGG 



350 

He Thr Tyr Ser 

ATC ACG TAC TCC 
TAG TGC ATG AGG 



Thr Tyr Gly Lys 
ACC TAC GGC AAG 
TGG ATG CCG TTC 



355 








Phe 


Leu 


Ala 


Asp 


TTC 


CTT 


GCC 


GAC 


AAG 


GAA 


CGG 


CTG 





360 






Gly 


Gly 


Cys 


Ser 


GGC 


GGG 


TGC 


TCG 


CCG 


CCC 


ACG 


AGC 



365 

Gly Gly Ala Tyr 
GGG GGC GCT TAT 
CCC CCG CGA ATA 



370 

Asp He He He Cys Asp Glu Cys 
GAC ATA ATA ATT TGT GAC GAG TGC 
CTG TAT TAT TAA ACA CTG CTC ACG 



375 










380 






His 


Ser 


Thr 


Asp 


Ala 


Thr 


Ser 


11 


CAC 


TCC 


ACG 


GAT 


GCC 


ACA 


TCC 


ATC 


GTG 


AGG 


TGC 


CTA 


CGG 


TGT 


AGG 


TAG 
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385 390 395 

Leu Gly II lily Thr Val Leu Asp Gin Ala Glu Thr Ala Gly Ala Arg 
"TTG GGC ATT GGC ACT GTC CTT GAC CAA GCA GAG ACT GCG GGG GCG AGA 
AAC CCG TAA CCG TGA CAG GAA CTG GTT CGT CTC TGA CGC CCC CGC TCT 



400 
Leu Val Val 
CTG GTT GTG 
GAC CAA CAC 



415 

His Pro Asn 
CAT CCC AAC 
GTA GGG TTG 



Leu Ala 
CTC GCC 
GAG CGG 



lie Glu 
ATC GAG 
TAG CTC 



Thr 

ACC 
TGG 



420 
Glu 
GAG 
CTC 



405 

Ala Thr 

GCC ACC 
CGG TGG 



Val Ala 
GTT GCT 
CAA CGA 



Pro 

CCT 
GGA 



Leu 
CTG 
GAC 



Pro 
CCG 
GGC 



Ser 
TCC 
AGG 



Gly 

GGC 
CCG 



410 

Ser 

TCC 
AGG 



425 
Thr 
ACC 
TGG 



Thr 

ACC 
TGG 



Val 
GTC 
CAG 



Gly 
GGA 
CCT 



Thr 

ACT 
TGA 



GlU 
GAG 
CTC 



val Pr . 

GTG CCC 
CAC GGG 



430.': 

lie -Pro 
ATC CCT 
TAG GGA 



Phe Tyr Gly 
TTT TAC GGC 
AAA ATG CCG 



•*V"i 



4< Leu lie Phe 
^ CTC ATC TTC 
GAG TAG: AAG 



435 
Lys Ala 
AAG GCT 
TTC CGA 



450 

Cys His 
TGT CAT 
ACA GTA 



lie 
ATC 
TAG 



Ser 
TCA 
ACT 



Pro Leu 
CCC CTC 
GGG GAG 



Lys Lys 
AAG AAG 
TTC TTC 



GlU 
GAA 
CTT 



455 

Lys 
AAG 
TTC 



440 

val 

GTA 
CAT 



Cys 
TGC 
ACG 



lie 
ATC 
TAG 



Asp 
GAC 
CTG 



Lys 
AAG 
TTC 



Gly 
GGG 
CCC 



GlU 
GAA 
CTT 



Leu 
CTC 
GAG 



Gly 
GGG 
CCC 



460 

Ala 

GCC 
CGG 



445 

Arg His 



TCT, GTA 



AGA>;CAT">?^g(| 



Ala Lys 

GCA AAG 
CGT TTC 



465 470 475 

Leu Val Ala Leu Gly lie Asn Ala Val Ala Tyr Tyr Arg Gly. Leu Asp 
CTG GTC GCA TTG GGC ATC AAT GCC GTG GCC TAC TAC CGC;GGT CTT..GAC, 
GAC CAG CGT AAC CCG TAG' TTA CGG CAC CGG ATG ATG GCG CCA GAA CTG - 



.*..-* i 
■ 1 
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%C 495 500 505 510 

Ala Leu Met Thr Gly Tyr Thr Gly Asp Phe Asp Ser Val lie Asp Cys 
^? GCC CTC ATG ACC GGC TAT ACC GGC GAC TTC GAC TCG GTG ATA GAC TGC 

^ CGG GAG TAC TGG CCG ATA TGG CCG CTG AAG CTG AGC CAC TAT CTG ACG 



515 520 525 

cia Asn Thr Cys Val Thr Gin Thr Val Asp Phe Ser Leu Asp Pro Thr Phe 

" ^V ' AAT ACG TGT GTC ACC CAG ACA GTC GAT TTC AGC CTT GAC CCT ACC TTC 

^v-.^' TTA TGC ACA CAG TGG GTC TGT CAG CTA AAG TCG GAA CTG GGA TGG AAG 

530 535 540 

Thr He Glu Thr He Thr Leu Pro Gin Asp Ala Val Ser Arg Thr Gin 
ACC ATT GAG ACA ATC ACG CTC CCC CAA GAT GCT GTC TCC CGC ACT CAA 
^£5g^TGG TAA CTC TGT TAG TGC GAG GGG GTT CTA CGA CAG AGG GCG TGA GTT 



545 550 555 

Arg Arg Gly Arg Thr Gly Arg Gly Lys Pro Gly He Tyr Arg Phe Val 
CGT CGG GGC AGG ACT GGC AGG GGG AAG CCA GGC ATC TAC AGA TTT GTG 
GCA GCC CCG TCC TGA CCG TCC CCC TTC GGT CCG TAG ATG TCT AAA CAC 




560 565 570 

Ala Pro Gly Glu Arg Pro Pro Gly Met Phe Asp Ser Ser Val Leu Cys 
-rGCA CCG GGG GAG CGC CCT CCC GGC ATG TTC GAC TCG TCC GTC CTC TGT 
CGT GGC CCC -CTC GCG GGA GGG CCG TAC AAG CTG AGC AGG CAG GAG ACA 



575 580 585 590 

Glu cys Tyr Asp Ala Gly Cys Ala Trp Tyr Glu Leu Thr Pro Ala Glu 
GAG TGC TAT GAC GCA GGC TGT GCT TGG TAT GAG CTC ACG CCC GCC GAG 
CTC ACG ATA CTG CGT CCG ACA CGA ACC ATA CTC GAG TGC GGG CGG CTC 




595 600 605 

Thr Thr Val Arg Leu Arg Ala Tyr Met Asn Thr Pro G-ly Leu Pro Val 
ACT ACA GTT AGG CTA CGA GCG TAC ATG AAC ACC CCG GGG CTT CCC. GTG 
TGA TGT CAA TCC GAT GCT CGC ATG TAC TTG TGG GGC CCC GAA GGG CAC 



F± caviare a. (continued) 
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/■-v;- f :.., 


610 










615 










620 


uys 


CI n Ion H4« 

OJLII Aap (1X9 


T AU 


Gin 




Ti**% 


Pill 


Pi X» 


vax 


Dha 

rue 




Gxy L»eu inr 


C TGP 


GAG CAT 


CTT 


GAA 


TTT 

X X A 


TGG 


GAG 


CCP 


GTC 


TTT 


ACA 


PCP PTP A PT 


apg 

ALU 


gtp PTG CPA 

VI1W WXw WJhC* 


GAA 

win 


CTT 


AAA 


ACC 


PTC 


PPG 


CAG 


AAA 


TGT 


PPG GAG TCI 
ULvi VjAVJ L\>A 




625 








630 










635 




His 


lie ASp Ala 


nlS 


jrne 


Leu 


ser 


Pi r» 

Gin 


rnr 


Lys Gin Ser 


Giy giu Asn 


CAT 


%m& f+\fjy ff*f* 
ATA GAT GCC 


pap 


TTT 


CIA 


ILL 


LAG 


AP A 

ACA 


AAG 


CAG 


AGT 


****** f* \r* Wf* 
GGG GAG AAC 


; GTA 


TAT CTA CGG 


GTG 


X X A 

AAA 


GAT 


AGG 


GTC 


TGT 


TTC 


GTC 


TCA 


m^^m^^^^ m^^^^ftm^^ VW*^*^^ 

CCC CTC TTG 




640 






645 










650 






Leu 


Pro Tyr Leu 


val 


Ala 


Tyr 


Gin 


Ala 


Thr 


Val 


Cys 


Ala 


Arg Ala Gin 


CTT 


CCT TAG CTG 


GTA 


GCG 


TAC 


CAA 


GCC 


ACC 


GTG 


TGC 


GCT 


AGG GCT CAA 


GAA 


G6A ATG GAC 


CAT 


CGC 


ATG 


GTT 


CGG 


TGG 


CAC 


ACG 


CGA 


TCC CGA GTT 



655 660 665 670 

Ala Pro Pro Pro Ser Trp Asp Gin Met Trp Lys Cys Leu lie Arg Leu 
GCC CCT CCC CCA TCG TGG GAC CAG ATG TGG AAG TGT TTG ATT CGC CTC 
CGG GGA GGG GGT AGC ACC CTG GTC TAC ACC TTC ACA AAC TAA GCG GAG 









675 










680 




685 


Pro 


Thr 


Leu 


His 


Gly 


Pro 


Thr 


Pro 


Leu 


Leu Tyr Arg 


Leu Gly Ala 


CCC 


ACC 


CTC 


CAT 


GGG 


CCA 


ACA 


CCC 


CTG 


CTA TAC AGA 


CTG GGC GCT - 


GGG 


TGG 


GAG 


GTA 


CCC 


GGT 


TGT 


GGG 


GAC 


GAT ATG TCT 


GAC CCG CGA 
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Asn Ser Glu Asn Gin Val Glu Gly 
• AAT TCG GAA AAC CAA GTG GAG GGT 
TTA AGC CTT TTG GTT CAC CTC CCA 
t 

EcoRI 
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Glu Val Gin lie Val Ser Thr Ala 

GAG GTC CAG ATT GTG TCA ACT GCT 
CTC CAG GTC TAA CAC AGT TGA CGA 



Ala Gin Thr Phe Leu Ala Thr cys He Asn Gly val cys Trp Thr val 

GCC CAA ACC TTC CTG GCA ACG TGC ATC AAT GGG GTG TGC TGG ACT GTC 
CGG GTT TGG AAG GAC CGT TGC ACG TAG TTA CCC CAC ACG ACC TGA CAG 

t 

SfaNI 



Tyr His Gly Ala Gly Thr Arg Thr He Ala Ser Pro Lys Gly Pro Val 
TAC CAC GGG GCC GGA ACG AGG ACC ATC GCG TCA CCC AAG GGT CCT GTC 
ATG GTG CCC CGG CCT TGC TCC TGG TAG CGC AGT GGG TTC CCA GGA CAG 



He Gin Het Tyr Thr Asn Val Asp Gin Asp Leu Val Gly Trp Pro Ala 

ATC CAG ATG TAT ACC AAT GTA GAC CAA GAC CTT GTG GGC TGG CCC GCT 
TAG GTC TAC ATA TGG TTA CAT CTG GTT CTG GAA CAC CCG ACC GGG CGA 



S r Gin Gly Thr Arg Ser Leu Thr Pro Cys Thr Cys Gly Ser Ser Asp 
TCG CAA GGT ACC CGC TCA TTG ACA CCC TGC ACT TGC GGC TCC TCG GAC 
AGC GTT CCA TGG GCG AGT AAC TGT GGG ACG TGA ACG CCG AGG AGC CTG 



L u Tyr Leu Val Thr Arg His Ala Asp Val He Pro Val Arg Arg Arg 

CTT TAC CTG GTC ACG AGG CAC GCC GAT GTC ATT CCC GTG CGC CGG CGG 
GAA ATG GAC CAG TGC TCC GTG CGG CTA CAG TAA GGG CAC GCG GCC GCC 

' t 

Nael 

Gly Asp Ser Arg Gly Ser Leu Val Ser Pro Arg Pro He Ser Tyr L u 
GGT GAT AGC AGG GGC AGC CTC GTG TCG CCC CGG CCC ATT TCC TAC TTG 
CCA CTA TCG TCC CCG TCG GAG CAC AGC GGG GCC GGG TAA AGG ATG AAC 



Lys Gly Ser Ser Gly Gly Pro Leu Pro Asn 
AAA GGC TCC TCG GGG GGT CCG CTG CCG AAT TQ 
TTT CCG AGG AGC CCC CCA GGC GAC GGC TTA 

t 

EcoRI 
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C26d : 



Q14298 



Glu Ph Gly Gly Leu Leu Leu Cys Pro Ala Ala Ala Val Gly II Ph 
GAA TTC GGG GGC CTG CTG TTG TGC CCC GCG GCA GCC GTG GGC ATA TTT 
CTT AAG CCC CCG GAC GAC AAC ACG GGG CGC CGT CGG CAC CCG TAT AAA 



Arg Ala Ala val Cys Thr Arg Gly Val Ala Lys Ala Val Asp Phe XI 

AGG GCC GCG GTG TGC -ACC CGT GGA GTG GCT AAG GCG GTG GAC TTT ATC 
TCC CGG CGC CAC ACG TGG GCA CCT CAC CGA TTC CGC CAC CTG AAA TAG 



Pro Val Glu Asn Leu Glu Thr Thr Met Arg Ser Pro Val Phe Thr Asp 
CCT GTG GAG 4 AAC CTA GAG ACA ACC ATG AGG TCC CCG GTG TTC ACG GAT 
GGA CAC CTC TTG GAT CTC TGT TGG TAC TCC AGG GGC CAC AAG TGC CTA 



Asn Ser Ser Pro Pro Val Val Pro Gin Ser Phe Gin Val Ala His Leu 
AAC TCC TCT CCA CCA GTA GTG CCC CAG AGC TTC CAG GTG GCT CAC CTC 
TTG AGG AGA GGT GGT CAT CAC GGG GTC TCG AAG GTC CAC CGA GTG GAG 



t 

£croRI 



t 

Ddel 



t 

feoRZZ 



His Ala Pro 
CAT GCT CCC 
GTA CGA GGG 



Arg 
CGA 
GCT 
t 



He 
ATT C 
TAA G 



£coRX 
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Pro cys Thr Cys Gly Ser Ser Asp L u Tyr Leu Val Thr Arg His Ala 
.. CCC TGC ACT TGC GGC TCC TCG GAC CTT TAC CTG GTC ACG AGG CAC GCC 
GGG ACG TGA ACG CCG AGG AGC CTG GAA ATG GAC CAG TGC TCC GTG CGG 



Asp Val He Pro Val Arg Arg Arg Gly Asp Ser Arg Gly Ser Leu L tr • 
GAT GTC ATT CCC GTG CGC CGG CGG GGT GAT AGC AGG GGC AGC CTG CTG 
CTA CAG TAA GGG CAC GCG GCC GCC CCA CTA TCG TCC CCG TCG GAC GAC 



Ser Pro Arg Pro He Ser Tyr Leu Lys Gly Ser Ser Gly Gly Pro Leu 
TCG CCC CGG CCC ATT TCC TAC TTG AAA GGC TCC TCG GGG GGT CCG CTG 
AGC GGG GCC GGG TAA AGG ATG AAC TTT CCG AGG AGC CCC CCA GGC GAC 



Leu cys Pro Ala Gly His Ala Val Sly He Phe Arg Ala Ala Val Cys 
TTG TGC CCC GCG GGG CAC GCC GTG GGC ATA TTT AGG GCC GCG GTG TGC 
AAC ACG GGG CGC CCC GTG CGG CAC CCG TAT AAA TCC CGG CGC CAC ACG 



Thr Arg Gly Val Ala Lys Ala Val Asp Phe He Pro Val Glu Asn L u 
ACC CGT GGA GTG GCT AAG GCG GTG GAC TTT ATC CCT GTG GAG AAC CTA 
TGG GCA CCT CAC CGA TTC CGC CAC CTG AAA TAG GGA CAC CTC TTG GAT 



t 

Ddel 



Glu Thr Thr Met Arg Ser 
GAG ACA ACC ATG AGG TCC 
CTC TGT TGG TAC TCC AGG 



Pro 

CCG 
GGC 



Val Phe Thr 
GTG TTC ACG 
CAC AAG TGC 



Asp Asn Ser 
GAT AAC TCC TC 
CTA TTG AGG AG 



Figure 4 
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He Arg Gly Thr Tyr Val Tyr Asn 

ATT CGG GGC ACC TAT GTT TAT AAC 

TAA GCC CCG TGG ATA CAA ATA TTG 
t 

JETCORI 



His Leu Thr Pro Leu Arg Asp Trp 
CAT CTC ACT CCT CTT CGG GAC TGG 
GTA GAG TGA GGA GAA GCC CTG ACC 



Ala His Asn Gly Leu Arg Asp Leu Ala Val Ala Val Glu Pro Val Val 
GCG CAC AAC GGC TTG CGA GAT CTG GCC GTG GCT GTA GAG CCA GTC GTC 
CGC GTG TTG CCG AAC GCT CTA GAC CGG CAC CGA CAT CTC GGT CAG CAG 



Phe Ser Gin Met Glu Thr Lys Leu He Thr Trp Gly Ala Asp Thr Ala 
TTC TCC CAA ATG GAG ACC AAG CTC ATC ACG TGG GGG GCA GAT ACC GCC 
AAG AGG GTT TAC CTC TGG TTC GAG TAG TGC ACC CCC CGT CTA TGG CGG 



Ala Cys Gly Asp He He Asn Gly Leu Pro Val Ser Ala Arg Arg Gly 
GCG TGC GGT GAC ATC ATC AAC GGC TTG CCT GTT TCC GCC CGC AGG GGC 
CGC ACG CCA CTG TAG TAG TTG CCG AAC GGA CAA AGG CGG GCG TCC CCG 



Arg 

CGG 
GCC 


GlU 
GAG 
CTC 


He 
ATA 
TAT 


Leu 
CTG 
GAC 


Leu 
CTC 
GAG 


Gly Pro Ala 
GGG CCA GCC 
CCC GGT CGG 


Asp 
GAT 
CTA 


Gly 
GGA 
CCT 


Met 
ATG 
TAC 


Val 
GTC 
CAG 


Ser 

TCC 
AGG 


Lys 

AAG 
TTC 


Gly Trp 

GGT TGG 
CCA ACC 


Arg 

AGG 
TCC 


Leu 
TTG 
AAC 


Leu 
CTG 
GAC 


Ala 

GCG 
CGC 


Pro 

CCC 
GGG 


He 
ATC 
TAG 


Thr 
ACG 
TGC 


Ala 

GCG 
CGC 


Tyr 
TAC 
ATG 


Ala 
GCC 
CGG 


Gin 
CAG 
GTC 


Gin 
CAG 
GTC 


Thr 
ACA 
TGT 


Arg 

AGG 
TCC 


Gly L u 
GGC CTC 
CCG GAG 


Leu 
CTA 
GAT 


Gly 
GGG 
CCC 


Cys 
TGC 
ACG 


He 
ATA 
TAT 


He 

ATC 
TAG 


Thr 
ACC 
TGG 


Ser 
AGC 
TCG 


Leu 
CTA 
GAT 


Thr 
ACT 
TGA 


Gly 
GGC 
CCG 


Arg 
CGG 
GCC 


Asp 
GAC 
CTG 


Lys 

AAA 

TTT 


Asn 
AAC 

TTG 


Gin 
CAA 
GTT 


Val 
GTG 
CAC 


GlU 
GAG 
CTC 


Gly 
GGT 
CCA 


GlU 
GAG 
CTC 


Val 
GTC 
CAG 


Gin 
CAG 
GTC 


He 
ATT 
TAA 


Val 
GTG 
CAC 


Ser 
TCA 
AGT 


Thr 

ACT 
TGA 


Ala 

GCT 
CGA 


Ala 
GCC 
CGG 


Gin 
CAA 
GTT 


Thr 
ACC 
TGG 


Phe 
TTC 
AAG 


Leu 
CTG 
GAC 


Ala 
GCA 
CGT 



Thr Cys . He Asn Gly 
ACG TGC ATC AAT GGG 
TGC ACG TAG TTA CCC 
T 

5/aNI 



Val Cys Trp Pro Asn 
GTG TGC TGG CCG AAT TC 
CAC ACG ACC GGC TTA AG 

t 

EecRl 
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Glu Ph Gly Ser 
- — SAA TTC G6G TCC 
CTT AAG CCC AGG 
t 

£coRI 
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val 

GTC 
CAG 



lie 
ATC 
TAG 



Pro Thr 

CCG ACC 
GGC TGG 



Ser Gly 

AGC GGC 
TCG CCG 



Asp Val Val Val 

GAT GTT GTC GTC 
CTA CAA CAG CAG 



Val Ala 

GTC GCA 

CAG CGT --^^^a 



Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly Asp 
ACC GAT GCC CTC ATG ACC GGC TAT ACC GGC GAC 
TGG CTA CGG GAG TAC TGG CCG ATA TGG CCG CTG 



Asp Cys Asn Thr cys val Thr Gin Thr val Asp 
GAC TGC AAT ACG TGT GTC ACC CAG ACA GTC GAT 
-CTG ACG TTA TGC ACA CAG TGG GTC TGT CAG CTA 



Phe Asp Ser 
TTC GAC TCG 
AAG CTG AGC 
t 

HintI 



Phe ser Leu 
TTC AGC CTT 
AAG TCG GAA 



Val II 

GTG ATA 
CAC TAT 



Asp Pr 
GAC CCT 
CTG GG A 



Thr Phe 
ACC TTC 
TGG AAG 



Thr Gin 
t — ACT CAA 
TGA GTT 



Phe Val 
TTT GTG 
AAA CAC 



Cys 
TGT 
ACA 



Thr lie Glu Thr lie Thr Leu Pro 
ACC ATT GAG ACA ATC ACG CTC CCC 
TGG TAA CTC TGT TAG TGC GAG GGG 



Arg Arg Gly Arg Thr Gly Arg Gly 
CGT CGG GGC AGG ACT GGC AGG GGG 
GCA GCC CCG TCC TGA CCG TCC CCC 



Ala Pro Gly Glu Arg Pro Ser Gly 
GCA CCG GGG GAG CGC CCC TCC GGC 
CGT GGC CCC CTC GCG GGG AGG CCG 

t 

Bgll 



Glu Cys -Pro Asn 
GAG TGC CCG AAT TC 
CTC ACG GGC TTA AG 



Gin Asp Ala val ser 
CAA GAT GCT GTC TCC 
GTT CTA CGA CAG AGG 



Lys Pro Gly lie Tyr 
AAG CCA GGC ATC TAC 
TTC GGT CCG TAG ATG 



Met Phe Asp Ser Ser. 
ATG TTC GAC TCG TCC 
TAC AAG CTG AGC AGG 
t 

Binfl 



Arg 

GCG: 

val ■■^iMjm 



GTC X 
CAG 
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11 

ATT 
TAA 


Arg Ser 
CGG TCC 
GCC AGG 


11 

ATT 

TAA 


GlU 
GAG 

CTC 


Thr 
ACA 
TGT 


11 

ATC 

TAG 


Thr 

ACG 


■ 














Thr 
ACT 
TGA 


Gin Arg Arg 
CAA CGT CGG 
GTT GCA GCC 


Gly 

GGC 

****** 

CCG 


Arg Thr 

AGG ACT 
TCC TGA 


Gly 

GGC 
CCG 


Phe 
TTT 
AAA 


val 

GTG 
CAC 


Ala 

GCA 
CGT 


Pro 
CCG 
GGC 


Gly 
GGG 
CCC 


Glu Arg 
GAG CGC 
CTC GCG 


Pro 
CCC 
GGG 














Bgll 


Leu 
CTC 
GAG 


Cys Glu Cys 
TGT GAG TGC 
ACA CTC ACG 


Tyir 
TAT 
ATA 


Asp Ala 
GAC GCA 
CTG CGT 


Gly 
GGC 
CCG 


Ala 
GCC 
CGG 


GlU 
GAG 
CTC 


Thr 

ACT 
TGA 


Thr 
ACA 
TGT 


Vdi 

GTT 
CAA 


Arg Leu 
AGG CTA 
TCC GAT 


Arg 
CGA 
GCT 


Pro 
CCC 
GGG 


Val 
GTG 
CAC 


Cys 
TGC 
ACG 


Gin 
CAG 
GTC 


Asp 
GAC 
CTG 


His 
CAT 
GTA 


Leu 
CTT 
GAA 


GlU 
GAA 
CTT 


Leu 
CTC 
GAG 


Thr His 
ACT CAT 
TGA GTA 


He 
ATA 
TAT 


ASp 

GAT 
CTA 


Ala 

GCC 
CGG 


His 
CAC 
GTG 


Phe 
TTT 
AAA 


GlU 
GAG 
CTC 


Asn 
AAC 
TTG 


Leu 
CTT 
GAA 


Pro 

CCT 
GGA 


Tyr 
TAC 
ATG 


Leu 
CTG 
GAC 


Val 
GTA 
CAT 


Ala 

GCG 
CGC 


Ala 
GCT 
CGA 


Gin 
CAA 
GTT 


Ala 
GCC 
CGG 


Pro 
CCT 
GGA 


pro 
CCC 
GGG 


Pro 
CCA 
GGT 


Ser 
TCG 
AGC 


Trp 

TGG 
ACC 


Arg 
CGC 
GCG 


Leu Lys 
CTC AAG 
GAG TTC 


Pro 

CCC 
GGG 


Thr 

ACC 
TGG 


Leu 
CTC 
GAG 


His 
CAT 
GTA 


Gly 
GGG 
CCC 


Gly 
GGC 
CCG 


Ala 

GCT 
CGA 


Ala 

GCC 
CGG 


Glu 
GAA 
CTT 


Phe 
TTC 
AAG 









t 
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Leu Pro Gin Asp Ala Val Ser Arg 
CTC CCC CAG GAT GCT GTC TCC CGC 
GAG GGG GTC CTA CGA CAG AGG GCG 



Arg Gly Lys Pro Gly He Tyr Arg 
AGG. GGG AAG CCA GGC ATC TAC AGA 
TCC CCC TTC GGT CCG TAG ATG TCT 

Ser Gly Let Phe Asp Ser Ser Val 
TCC GGC ATG TTC GAC TCG TCC GTC 
AGG CCG TAC AAG CTG AGC AGG CAG 



Cys Ala Trp Tyr Glu Leu Thr Pr 
TGT GCT TGG TAT GAG CTC ACG CCC 
ACA CGA ACC ATA CTC GAG TGC GGG 

Ala Tyr Met Asn Thr Pro Gly Leu 
GCG TAC ATG AAC ACC CCG GGG CTT 
CGC ATG TAC TTG TGG GGC CCC GAA 

Phe Trp Glu Gly Val Phe Thr Gly 
TTT TGG GAG GGC GTC TTT ACA GGC 
AAA ACC CTC CCG CAG AAA TGT CCG 

Leu Ser Gin Thr Lys Gin Ser Gly 
CTA TCC CAG ACA AAG CAG AGT GGG 
GAT AGG GTC TGT TTC GTC TCA CCC 

Tyr Gin Ala Thr Val Cys Ala Arg 
TAC CAA GCC ACC GTG TGC GCT AGG 
ATG GTT CGG TGG CAC ACG CGA TCC 

Asp Gin Met Trp Lys Cys Leu He 
GAC CAG ATG TGG AAG TGT TTG ATT 
CTG GTC TAC ACC TTC ACA AAC TAA 

Pro Thr Pro Leu Leu Tyr Arg Leu 
CCA ACA CCC CTG CTA TAC AGA CTG 
GGT TGT GGG GAC GAT ATG TCT GAC 



Figure "7 
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Glu Phe Gly Ala Val Asp Phe He Pr Val Glu Asn Leu Glu Thr Thr 
GAA TTC GGG GCG GTG GAC TTT ATC CCT GTG GAG AAC CTA GAG ACA ACC 
CTT AAG CCC CGC CAC CTG AAA TAG GGA CAC CTC TTG GAT CTC TGT TGG 
t 

EcoRI 

Met Arg Ser Pro Val Phe Thr Asp Asn Ser Ser Pro Pro Val Val Pro 
ATG AGG TCC CCG GTG TTC ACG GAT AAC TCC TCT CCA CCA GTA GTG CCC 
TAC TCC AGG GGC CAC AAG TGC CTA TTG AGG AGA GGT GGT CAT CAC GGG 

Gin ser Phe Gin val Ala His Leu His Ala Pro Thr Gly Ser Gly Lys 

CAG AGC TTC CAG GTG GCT CAC CTC CAT GCT CCC ACA GGC AGC GGC AAA 
GTC'TCG AAG GTC CAC CGA GTG GAG GTA CGA GGG TGT CCG TCG CCG TTT 

Ser Thr Lys Val Pro Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu 
AGC ACC AAG GTC CCG GCT GCA TAT GCA GCT CAG GGC TAT AAG GTG CTA 
TCG TGG TTC CAG GGC CGA CGT ATA CGT CGA GTC CCG ATA TTC CAC GAT 

Val Leu Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met 
GTA CTC AAC CCC TCT GTT GCT GCA ACA CTG GGC TTT GGT GCT TAC ATG 
CAT GAG TTG GGG AGA CAA CGA CGT TGT GAC CCG AAA CCA CGA ATG TAC 

Ser Lys Ala His Gly He Asp Pro Asn He Arg Thr Gly Val Arg Thr 
TCC AAG GCT CAT GGG ATC GAT CCT AAC ATC AGG ACC GGG GTG AGA ACA 
AGG TTC CGA GTA CCC TAG CTA GGA TTG TAG TCC TGG CCC CAC TCT TGT 

He Thr Thr Gly Ser Pro He Thr Tyr Ser Thr Tyr Gly Lys Phe Leu 
ATT ACC ACT GGC AGC CCC ATC ACG TAC TCC ACC TAC GGC AAG TTC CTT 
TAA TGG TGA CCG TCG GGG TAG TGC ATG AGG TGG ATG CCG TTC AAG GAA 

Ala Asp Gly Gly Cys Ser Gly Gly Ala Tyr Asp He lie He Cys Asp 
GCC GAC GGC GGG TGC TCG GGG GGC GCT TAT GAC ATA ATA ATT TGT. GAC 
CGG CTG CCG CCC ACG AGC CCC CCG CGA ATA CTG TAT TAT TAA ACA CTG 

Glu cys His Ser Thr Asp Ala Thr Ser He Leu Gly He Gly Thr val 
GAG TGC CAC TCC ACG GAT GCC ACA TCC ATC TTG GGC ATT GGC ACT GTC 
CTC ACG GTG AGG TGC CTA CGG TGT AGG TAG AAC CCG TAA CCG TGA CAG 
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ft 
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Leu Asp Gin 




CTT GAC CAA 


•1". •-'.-aly;-.*- ■- 


GAA CTG GTT 




< 

Ala Thr Pro 




AU\» v*Wl 




CGG TGG GGA 




."' ' 




Val Ala Leu 




GTT GCT CTG 




CAA CGA GAC 




Pro Leu Glu 




ggp^ CCC CTC GAA 




;GGG GAG CTT 
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Ala Gly Ala Arg Leu Val Val Leu Ala Thr 

GCG GGG GCG AGA CTG GTT GTG CTC GCC ACC 
CGC CCC CGC TCT GAC CAA CAC GAG CGG TGG 





fcUP 



Lys Lys Lys Cys Asp Glu 
AAG AAG AAG TGC GAC GAA 
TTC'TTC TTC ACG CTG CTT 



Asn Ala -val Ala Tyr Tyr 
f-^^-;: AAT GCC GTG GCC TAC TAC 
TTA CGG CAC CGG ATG ATG 



Serl Gly Asp Val Val Val 

IAGC- GGC V CAT "GTT GTC GTC 
- TCG i CCGi CTA CAA CAG CAG 



Thr Gly Asp : Phe Asp Ser 
ACC GGC GAC TTC GAC TCG 
TGG CCG CTG AAG CTG AGC 

< Bintl 



Val Thr Val Pro His Pro 
GTC ACT GTG CCC CAT CCC 
CAG TGA CAC GGG GTA GGG 



Gly Glu lie Pro Phe Tyr 
GGA GAG ATC CCT TTT TAC 
CCT CTC TAG GGA AAA ATG 



Gly Gly Arg His Leu lie 
GGG GGG AGA CAT CTC ATC 
CCC CCC TCT GTA GAG TAG 



Leu Ala Ala Lys Leu Val 
CTC GCC GCA AAG CTG GTC 
GAG CGG CGT TTC GAC CAG 



Arg Gly Leu Asp val Ser 
CGC GGT CTT GAC GTGTCC 
GCG CCA GAA CTG CAC AGG 



Val Ala Thr Asp Ala Leu 
GTG GCA- ACC GAT GCC . CTC 
CAC CGT TGG CTA CGG GAG 



Asn lie Glu Glu 
AAC ATC GAG GAG, 
TTG TAG CTC CTC 



Gly Lys Ala XI 
GGC AAG GCT ATC 
CCG TTC ■' CGA - TAG 



••?:>Vr*:--« 



Phe Cys His S rs-r'^^Xi^ 
TTC TGT CAT TCA 
AAG AGA GTASAGT^ 



Ala Leu Gly II 
GCA TTG GGC ATC 
CGT AAC CCG TAG 



val lie -i^?Thr^^^ 

.GTC.. ATC iCCGVAC^:^.-^^ 
CAG TAG GGC TGG ; • zJ^m 

... ■. .. '^^v^^lii 

.Met .Thr^Gly : ;Tjfe;r.;;|^ 
ATG •.ACC^GGCCTAT:^^^ 
' TAC TGG CC6'|ATAfe"i>4^T'*' " 



Val lie s Asp Cys Asn . Thr 
GTG- ATA GAC TGC AAT-ACG 
CAC TAT, CTG ACG TTA- TGC 




Cys Ala- 

:TGT ••GCC:;;GA^.fTTC^^ i3S 



' J -'\t: : : 




... ■* -^H^^ 
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C35 
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C33C 



▼ 

C200 



SfaUI 
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C?f 



C20C 
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Mel 

i 



C8h 



£C0RII 
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C26d 
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C300 
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C300 



C7fC20cC300 



Wdel 
i 



C200 



C7fC20cC300C200 
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-155 -150 
Ci 4304 M t Ala Thr Asn Pro Val Cys Val Leu 

ATG GCT ACA AAC CCT GTT TGC GTT TTG 
TAC CGA TGT TTG GGA CAA ACG CAA AAC 

-145 -140 -135 

Lys Gly Asp Gly Pro Val Gin Gly lie lie Asn Phe Glu Gin Lys Glu 
AAG GGT GAC GGC CCA GTT CAA GGT ATT ATT AAC TTC GAG CAG AAG GAA 
TTC CCA CTG CCG GGT CAA GTT CCA TAA TAA TTG AAG CTC GTC TTC CTT 

-130 • -125 -120 -115 

Ser Asn Gly Pro Val Lys Val Trp Gly Ser He Lys Gly Leu Thr Glu 
AGT AAT GGA CCA GTG AAG GTG TGG GGA AGC A IT AAA GGA CTG ACT GAA 
TCA TTA CCT GGT CAC TTC CAC ACC CCT TCG TAA TTT CCT GAC TGA CTT 

-110 -105 -100 

Gly Leu His Gly Phe His Val His Glu Phe Gly Asp Asn Thr Ala Gly 
GGC CTG CAT GGA TTC CAT GTT CAT GAG TTT GGA GAT AAT ACA GCA GGC 
TBH*Wfr'*CS0Q GAC GTA CCT AAG GTA CAA GTA CTC AAA CCT CTA TTA TGT CGT CCG 

-95 -90 -85 

Cys Thr Ser Pro Gly Pro His Phe Asn Pro Leu Ser Arg Lys His Gly 
TGT ACC AGT CCA GGT CCT CAC TTT AAT CCT CTA TCC AGA AAA CAC GGT 
ACA TGG TCA GGT CCA GGA GTG AAA TTA GGA GAT AGG TCT TTT GTG CCA 

-70 



-55 







-80 










-75 




Gly 


Pro 


Lys 


Asp 


Glu 


GlU 


Arg 


His 


Val 


GGG 


CCA 


AAG 


GAT 


GAA 


GAG 


AGG 


CAT 


GTT 


CCC 


GGT 


TTC 


CTA 


CTT 


CTC 


TCC 


GTA 


CAA 




-65 










-60 






Ala 


Asp 


Lys 


Asp 


Gly 


val 


Ala 


Asp 


val 


GCT 


GAC 


AAA 


GAT 


GGT 


GTG 


GCC 


GAT 


GTG 


CGA 


CTG 


TTT 


CTA 


CCA 


CAC 


CGG 


CTA 


CAC 


-50 










-45 








s r 


Leu 


Ser 


Gly 


Asp 


His 


Cys 


lie 


He 


TCA 


CTC 


TCA 


GGA 


GAC 


CAT 


TGC 


ATC 


ATT 


AGT 


GAG 


AGT 


CCT 


CTG 


GTA 


ACG 


TAG 


TAA 



-40 -35 













-30 










-25 






-20 




GlU 


Lys 


Ala 


Asp 


Asp 


Leu 


Gly 


Lys 


Gly 


Gly Asn Glu Glu Ser Thr Lys 




GAA 


AAA 


GCA 


GAT 


GAC 


TTG 


GGC 


AAA 


GGT 


GGA AAT GAA 


GAA 


AGT 


ACA AAG 




CTT 


TTT 


CGT 


CTA 


CTG 


AAC 


CCG 


TTT 


CCA 


CCT TTA CTT 


CTT 


TCA 


TGT TTC 










-15 










-10 






-5 






... Thr 


Gly 


Asn 


Ala 


Gly 


Ser 


Arg 


Leu 


Ala 


cys Gly Val 


He 


Gly 


He Arg 




ACA 


GGA 


AAC 


GCT 


GGA 


AGT 


CGT 


TTG 


GCT 


TGT GGT GTA 


ATT 


GGG 


ATC CGA 




TGT 


CCT 


TTG 


CGA 


CCT 


TCA 


GCA 


AAC 


CGA 


ACA CCA CAT 


TAA 


CCC 


TAG GCT 
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15 10 

Arg lie Gly Thr Tyr Val Tyr Asn His Leu Thr Pr Leu Arg Asp Trp 
ATT CGG GGC ACC TAT GTT TAT AAC CAT CTC ACT CCT CTT CGG GAC TGG 
TAA GCC CCG TGG ATA CAA ATA TTG GTA GAG TGA GGA GAA GCC CTG ACC 

15 20 25 30 

Ala His Asn Gly Leu Arg Asp Leu Ala Val Ala Val Glu Pro Val Val 

GCG CAC AAC GGC TTG CGA GAT CTG GCC GTG GCT GTA GAG CCA GTC GTC 

CGC GTG TTG CCG AAC GCT CTA GAC CGG CAC CGA CAT CTC GGT CAG CAG 

35 40 45 

Phe Ser Gin Met Glu Thr Lys Leu lie Thr Trp Gly Ala Asp Thr Ala 
TTC TCC CAA ATG GAG ACC AAG CTC ATC ACG TGG GGG GCA GAT ACC GCC 
AAG AGG GTT TAC CTC TGG TTC GAG TAG TGC ACC CCC CGT CTA TGG CGG 

50 55 60 

Ala Cys Gly Asp lie lie Asn Gly Leu Pro Val ser Ala Arg Arg Gly 
GCG TGC GGT GAC ATC ATC AAC GGC TTG CCT GTT TCC GCC CGC AGG GGC 
CGC ACG CCA CTG TAG TAG TTG CCG AAC GGA CAA AGG CGG GCG TCC CCG 



65 70 
Arg Glu lie Leu Leu Gly Pro Ala 
CGG GAG ATA CTG CTC GGG CCA GCC 
GCC CTC TAT GAC GAG CCC GGT CGG 



75 

Asp Gly Met Val Ser Lys Gly Trp 
GAT GGA ATG GTC TCC AAG GGT TGG 
CTA CCT TAC CAG AGG TTC CCA ACC 



80 

Arg Leu* Leu Ala 
AGG TTG CTG GCG 
TCC AAC GAC CGC 



85 

Pro lie Thr Ala 
CCC ATC ACG GCG 
GGG TAG TGC CGC 



90 

Tyr Ala Gin Gin 

TAC GCC CAG CAG 
ATG CGG GTC GTC 



Thr Arg Gly Leu 

ACA AGG GGC CTC 
TGT TCC CCG GAG 



95 100 105 110 

Leu Gly cys lie lie Thr Ser Leu Thr Gly Arg Asp Lys Asn Gin Val 
CTA GGG TGC ATA ATC ACC AGC CTA ACT GGC CGG GAC AAA AAC CAA GTG 
GAT CCC ACG TAT TAG TGG TCG GAT TGA CCG GCC CTG TTT TTG GTT CAC 



115 120 125 

Glu Gly Glu Val Gin He Val Ser Thr Ala Ala Gin Thr Phe Leu Ala 
GAG GGT GAG GTC CAG ATT GTG TCA ACT GCT GCC CAA ACC TTC CTG GCA ~ 
CTC CCA CTC CAG GTC TAA CAC AGT TGA CGA CGG GTT TGG AAG GAC CGT 
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Thr 
ACG 
TGC 


Cys 
TGC 
ACG 


He 
ATC 
TAG 


130 
He 
ATC 
TAG 


Asn 
AAT 
TTA 


Gly 
GGG 
CCC 


Val 
GTG 
CAC 


Cys 
TGC 
ACG 


135 

Trp 

TGG 
ACC 


Thr 
ACT 
TGA 


val 

GTC 
CAG 


Tyr His 
TAC CAC 
ATG GTG 


140 

Gly Ala Gly 
GGG GCC GGA 
CCC CGG CCT 


Thr 

ACG 
TGC 


Arg 
AGG 
TCC 


145 
Thr 
ACC 
TGG 


He 
ATC 
TAG 


Ala 
GCG 
CGC 


Ser 
TCA 
AGT 


150 
Pro Lys 
CCC AAG 
GGG TTC 


Gly 
GGT 
CCA. 


Pro 
CCT 
GGA 


Val 
GTC 
CAG 


155 
He Gin 
ATC CAG 
TAG GTC 


Met Tyr Thr 
ATG TAT ACC 
TAC ATA TGG 


Asn 
AAT 
TTA 


160 
Val 
GTA 
CAT 


Asp 
GAC 
CTG 


Gin 
CAA 
GTT 


Asp 
GAC 
CTG 


Leu 
CTT 
GAA 


165 

Val Gly 
GTG GGC 
CAC CCG 


Trp 

TGG 
ACC 


Pro 

CCC 
GGG 


Ala 
GCT 
CGA 


170 

Ser Gin 
TCG CAA 
AGC GTT 


Gly Thr Arg 

GGT ACC CGC 
CCA TGG GCG 


175 
Ser 
TCA 
AGT 


Leu 
TTG 
AAC 


Thr 
ACA 
TGT 


Pro 
CCC 
GGG 


Cys 
TGC 
ACG 


130 
Thr 
ACT 
TGA 


Cys Gly 
TGC GGC 
ACG CCG 


Ser 
TCC 
AGG 


Ser 
TCG 
AGC 


185 
Asp 
GAC 
CTG 


Leu Tyr 
CTT TAC 
GAA ATG 


190 

Leu Val Thr 
CTG GTC ACG 
GAC CAG TGC 


- 

Arg 
AGG 
TCC 


His 
CAC 
GTG 


Ala Asp 

GCC GAT 
CGG CTA 


Val 
GTC 
CAG 


He 
ATT 
TAA 


Pro 
CCC 
GGG 


Val 
GTG 
CAC 


Arg 
CGC 
GCG 


200 
Arg 

CGG 
GCC 


Arg 
CGG 
GCC 


Gly Asp 
GGT GAT 
CCA CTA 


Ser Arg Gly 

AGC AGG GGC 
TCG TCC CCG 


















1 

Nael 








Ser 

AGC 
TCG 


Leu 
CTG 
GAC 


Leu 
CTG 
GAC 


210 
Ser 
TCG 
AGC 


Pro 
CCC 
GGG 


Arg 
CGG 
GCC 


Pro 
CCC 
GGG 


He 
ATT 
TAA 


215 
Ser 
TCC 
AGG 


Tyr 
TAC 
ATG 


Leu 
TTG 
AAC 


Lys Gly 
AAA GGC 
TTT CCG 


220 

Ser Ser Gly 
TCC TCG GGG 
AGG AGC CCC 


Gly 
GGT 
CCA 


Pro 

CCG 
GGC 


225 
Leu 
CTG 
GAC 


Leu 
TTG 
AAC 


Cys 
TGC 
ACG 


Pro 
CCC 
GGG 


230 
Ala Gly 
GCG GGG 
CGC CCC 


His 
CAC 
GTG 


Ala 

GCC 
CGG 


Val 
GTG 
CAC 


235 
Gly He 
GGC ATA 
CCG TAT 


Phe Arg Ala 
TTT AGG GCC 
AAA TCC CGG 


Ala 

'GCG 
CGC 


240 
Val 
GTG 
CAC 


Cys Thr 
TGC ACC 
ACG TGG 


Arg 
CGT 
GCA 


Gly 
GGA 
CCT 


245 
Val 
GTG 
CAC 


Ala 
GCT 
CGA 


Lys 
AAG 
TTC 


Ala 

GCG 
CGC 


Val 
GTG 
CAC 


250 

Asp Phe 
GAC TTT 
CTG AAA 


He Pro Val 
ATC CCT GTG 
TAG GGA CAC 



F±<g"LX2ro 3. 0 (continued) 



ft It t 



Km,: 
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255 

Glu Asn LOU 
GAG AAC CTA 
CTC TT6 GAT 



Ser Pro Pro 
TCT CCA CCA 
AGA GGT GGT 



Glu Thr 

GAG ACA 
CTC TGT 



275 

val val 

GTA GTG 
CAT CAC 



260 

Thr Met 
ACC ATG 
TGG TAC 



Pro Gin 

CCC CAG 
GGG GTC 
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Arg s r Pro 
AGG TCC CCG 
TCC AGG GGC 
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265 
Val 
GTG 
CAC 



280 

Ser Phe Gin Val 
AGC TTC CAG GTG 
TCG AAG GTC CAC 



Phe 
TTC 
AAG 



Ala 

GCT 
CGA 



Thr Asp 

ACG GAT 
TGC CTA 



His Leu 
CAC CTC 
GTG GAG 



270 

Asn Ser 
AAC TCC 
TTG AGG 



285 

His Ala 

cat;gct: 

GTA CGA 



290 

Pro Thr Gly Ser 
CCC ACA GGC AGC 
GGG TGT CCG TCG 



305 

Gin Gly Tyr Lys 

CAG GGC TAT AAG 
GTC CCG ATA TTC 



320 

Gly Phe Gly Ala 
GGC TTT GGT GCT 
CCG AAA CCA CGA 



r 335 

Arg . Thr Gly Val 
AGG ACC GGG GTG 
C TCC TGG CCC CAC 



Thr Tyr 

--ACC TAC GGC 
' TGG ATG CCG 



Lys: 

AAG 
TTC 



295 300 
Gly Lys Ser Thr Lys Val Pro Ala Ala Tyr 
GGC AAA AGC ACC AAG GTC CCG GCT GCA TAT 
CCG TTT TCG TGG TTC CAG GGC CGA CGT ATA 

t 

NdeZ 

310 315 
Val Leu Val Leu Asn Pro Ser Val Ala Ala 
GTG CTA GTA CTC AAC CCC TCT GTT GCT GCA 
CAC GAT CAT GAG TTG GGG AGA CAA CGA CGT 



325 330 
Tyr Met Ser Lys Ala His Gly lie Asp Pro 
TAC ATG TCC AAG GCT CAT GGG ATCGAT CCT 
ATG TAC AGG TTC CGA GTA CCC TAG CTA GGA 



• 340 . .; 345 

Arg Thr lie Thr Thr Gly Ser Pro lie Thr 
AGA ACA ATT - ACC ACT GGC AGC CCC ATC ACG 
TCT TGT TAA TGG TGA CCG TCG GGG TAG TGC 



355 1360 

Phe Leu Ala Asp Gly Gly cys Ser Gly Gly 

TTC CTT: GCC GAC GGC GGG TGC TCG GGG GGC 
AAG GAA CGG CTG : CCG CCC ACG - AGC CCC CCG 



Ala Ala 

GCA GCT 
CGT CGA - 



Thr Leu 
ACA CTG 
TGT GAC 




ACA CTG • 



c^kmf; GAC : ATA ATA 
^#f^^CTG. TAT "TAT. 




370 
lie 

: ATT 

TAA 



mm? 



...^ . 375 . ■ \ 380 

Cys Asp Glu Cys His Ser Thr Asp Ala Thr 
TGT GAC GAG TGC CAC TCC. ACG.tGAT GCC ACA 
ACA CTG" CTC ' ACG GTG AGG : TGC CTA? CGG TGT 



>e>; lO (continued) 



Asn M^wmm 

: AAC.;ATC : ^^iji 
TT6;TA«Jp||p| 

Tyr'/S 'tiliM 

" :. s^^m 

365 

iGCTSTAT^^^ 
CGA-xATA^-A^p^ 



Ser^Il j 

•TCC.:ATC^^^fri 
;AGG^TAG^#^^'' 




S^^l 




1 
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385 390 
L u Gly He Gly Thr Val L u Asp 
TTG GGC ATT GGC ACT GTC CTT GAC 
AAC CCG TAA CCG TGA CAG GAA CTG 



395 

Gin Ala Glu Thr Ala Gly Ala Arg 
CAA GCA GAG ACT GCG GGG GCG AGA 
GTT CGT CTC TGA CGC CCC CGC TCT 



400 

Leu Val Val Leu 
CTG GTT GTG CTC 
GAC CAA CAC GAG 



405 

Ala Thr Ala Thr 
GCC ACC GCC ACC 
CGG TGG CGG TGG 



410 

Pro Pro Gly Ser 
CCT CCG GGC TCC 
GGA GGC CCG AGG 



Val Thr val Pro 
GTC ACT GTG CCC 
CAG TGA CAC GGG 



415 420 
His Pro Asn He Glu Glu Val Ala 
CAT CCC AAC ATC GAG GAG GTT GCT 
GTA GGG TTG TAG CTC CTC CAA CGA 



425 430 

Leu Ser Thr Thr Gly Glu He Pro 
CTG TCC ACC ACC GGA GAG ATC CCT 
GAC AGG TGG TGG CCT CTC TAG GGA 



Phe Tyr Gly Lys 
TTT TAC GGC AAG 
AAA ATG CCG TTC 



435 

Ala He Pro Leu 
GCT ATC CCC CTC 
CGA TAG GGG GAG 



440 

Glu Val He Lys 
GAA GTA ATC AAG 
CTT CAT TAG TTC 



445 

Gly Gly Arg His 
GGG GGG AGA CAT 
CCC CCC TCT GTA 



450 455 460 

Leu He Phe Cys His Ser Lys Lys Lys Cys Asp Glu Leu Ala Ala Lys 
CTC ATC TTC TGT CAT TCA AAG AAG AAG TGC GAC GAA CTC GCC GCA AAG 
GAG TAG AAG ACA GTA AGT TTC TTC TTC ACG CTG CTT GAG CGG CGT TTC 



465 470 475 

Leu Val Ala Leu Gly He Asn Ala Val Ala Tyr Tyr Arg Gly Leu Asp 
CTG GTC GCA TTG GGC ATC AAT GCC GTG GCC TAC TAC CGC GGT CTT GAC 
GAC CAG CGT AAC CCG TAG TTA CGG CAC CGG ATG ATG GCG CCA GAA CTG 



480 485 490 

Val Ser Val He Pro Thr Ser Gly Asp Val Val Val Val Ala Thr Asp 
GTG TCC GTC ATC CCG ACC AGC GGC GAT GTT GTC GTC GTG GCA ACC GAT 
CAC AGG CAG TAG GGC TGG TCG CCG CTA CAA CAG CAG CAC CGT TGG CTA 
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